Baichuan-Omni
paper modelFirst open-source 7B omni-modal large language model capable of concurrently processing and analyzing image, video, audio, and text modalities. Uses a two-stage training schema with multimodal alignment and multitask fine-tuning. Developed in collaboration with Westlake University and Zhejiang University.
Outputs 2
Baichuan-Omni Technical Report
paperTechnical report describing the omni-modal architecture, two-stage multimodal training schema, and evaluation results.
arXiv: 2410.08565
Baichuan-Omni (model)
modelOpen-source 7B multimodal model for image, video, audio, and text understanding.
Architecture DENSE
Parameters 7B