Speech-to-speech foundation model that unifies speech and text processing in a single architecture, designed streaming-first for low-latency conversational applications with natural turn-taking and barge-in handling. Adapts generated speech to acoustic context (tone, style) and the spoken content of the user's input.

Evaluated on multilingual ASR (FLEURS across 102 languages, MLS across 8 languages) in addition to standard speech-generation benchmarks. The Nova 2 Sonic successor (December 2025) extends to polyglot voices supporting English / French / Spanish / German / Italian / Portuguese / Hindi within a single conversation, with asynchronous tool calling. Initial report April 8, 2025; updated June 12, 2025. Available via Amazon Bedrock.

speechmultimodalfrontier

Related