InteractiveOmni | Lab Index

Unified open-source omni-modal large language model for audio-visual multi-turn interaction. Ranges from 4B to 8B parameters, integrating vision encoder, audio encoder, LLM, and speech decoder into a single model for comprehensive understanding and generation tasks. Leads the field of lightweight omni-modal models.

Paper (arXiv)GitHub

Outputs 2

InteractiveOmni Model

model

Variants

Name	Parameters	Notes
InteractiveOmni-4B	4B	—
InteractiveOmni-8B	8B	—

InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

paper

arXiv HTML

multimodalaudioopen-source

Your notes

Outputs 2

InteractiveOmni Model

Variants

InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue