Today at DevDay SF, OpenAI is launching a bunch of new capabilities to the platform.
Realtime API
The new Realtime API, now in public beta, allows paid developers to create low-latency, speech-to-speech experiences in apps, similar to ChatGPT’s Advanced Voice Mode. It supports real-time streaming of audio inputs and outputs, offering more natural and responsive conversations. Alongside this, an update to the Chat Completions API introduces audio input and output, supporting multimodal interactions with text or audio responses. These updates simplify the process for developers by consolidating speech recognition, text processing, and speech synthesis into a single API call, enhancing use cases like customer support and language learning.
Prompt Caching
Prompt Caching, introduced today, allows developers to reuse recently seen input tokens across multiple API calls, reducing both costs and latency. This feature provides a 50% discount and faster prompt processing times. Prompt Caching is automatically applied to the latest versions of GPT-4o, GPT-4o mini, o1-preview, o1-mini, and their fine-tuned counterparts.
Model Distillation
OpenAI is introducing a new Model Distillation offering, providing developers with an integrated workflow to manage the entire distillation pipeline directly within the platform. Model distillation fine-tunes smaller, cost-efficient models using outputs from more capable models, improving performance at a lower cost. This suite simplifies the previously complex, multi-step process with three key features: Stored Completions, which captures input-output pairs to build datasets for fine-tuning; Evals, a tool for custom performance evaluations; and seamless integration with OpenAI’s fine-tuning services. This offering reduces manual effort and streamlines model optimization.
Vision Fine-Tuning
OpenAI has introduced vision fine-tuning on GPT-4o, allowing developers to fine-tune the model using images in addition to text. This enhances the model’s image understanding capabilities, enabling applications such as improved visual search, better object detection for autonomous systems, and more accurate medical image analysis. While many developers have used text-only fine-tuning to improve task-specific performance, the addition of image fine-tuning addresses the limitations of text-based models for more complex, visual tasks.
For more news like this: thenextaitool.com/news

Comments
Post a Comment