This year's DevDay was all about enhancing the developer experience
At this year's DevDay conference, OpenAI introduced several API enhancements including a new Realtime API for speech-to-speech conversations, vision fine-tuning capabilities, prompt caching, and a model distillation functionality that allows smaller models to be fine-tuned on larger models' outputs.
Compared to last year's inaugural edition of DevDay, OpenAI's developer conference, this year's iteration was notoriously subdued. The conference, which happened earlier this week, was undoubtedly haunted by the recent departures of chief technology officer Mira Murati and chief research officer Bob McGrew, and by speculation surrounding the company's reported restructuring into a for-profit entity.
This year's DevDay also lacked splashy model announcements and remained notoriously silent about the GPT Store, introduced at last year's conference. The GPT Store was conceived as a platform where everyone could share their custom-made GPTs, potentially opening up a revenue stream for the creators of the most popular GPTs. Although there have reportedly been some attempts to test revenue sharing with some popular GPT creators, the company has remained largely silent on the subject.
OpenAI's announcements at the conference revolved exclusively around enhancing its developer API, introducing model distillation, prompt caching, vision fine-tuning, and the Realtime API. The latter was perhaps the biggest announcement of the event: the Realtime API enables developers to build swift and responsive (nearly real-time) speech-to-speech conversations into their applications using one of six preset voices which differ from the ones available for ChatGPT. For less demanding applications, the Chat Completions API will now support audio inputs and outputs so developers can have their applications accept audio or text as input and respond with audio, text, or both.
The added support for vision fine-tuning lets developers conduct additional fine-tuning on images to improve models' performance on vision-based tasks, including visual search and object detection. Prompt caching allows developers to reduce costs and latency by letting them reuse frequently seen input tokens, which result from sharing a codebase with a coding assistant or having multiturn conversations with a chatbot. In both cases, multiple prompts will likely refer to specific sections of a codebase, or to content shared in a previous turn from a conversation. Storing these in a cache makes them easier and faster to retrieve, while OpenAI's 50% discount on cached tokens drives down the cost of the application.
Finally, the model distillation functionality enables fine-tuning smaller models, like GPT-4o mini, on the output of larger ones, such as o1-preview and GPT-4o. As a result, developers should be able to lower their costs by using a smaller model but obtaining a performance boost from the distillation. The offering includes evaluation tools (currently in the beta stage) that enable developers to test the performance of their fine-tuned models.