LongCat-Flash-Omni
model paperNative omni-modal model supporting streaming audio-visual interaction. 560B MoE (27B active), 128K context, millisecond-level end-to-end latency, 8+ minutes of real-time audio-visual interaction. Benchmarks: 61.4 OmniBench, 78.2 VideoMME, 88.7 VoiceBench.