GPT-4o ("omni") — OpenAI's first natively multimodal model, processing text, audio, image, and video inputs and generating text and audio outputs within a single end-to-end architecture. 128K token context. Parameters undisclosed.

GPT-4o matched GPT-4 Turbo on text intelligence while being 2x faster and 50% cheaper. Its native audio capabilities enabled real-time voice conversation with emotional expression and multilingual support. Also released as GPT-4o mini (July 2024), a cost-optimized variant. AA Intelligence Index: 17. Proprietary.

Model Details

Context window 128,000
frontiermultimodalspeech

Related