Baichuan-Omni | Lab Index

First open-source 7B omni-modal large language model capable of concurrently processing and analyzing image, video, audio, and text modalities. Uses a two-stage training schema with multimodal alignment and multitask fine-tuning. Developed in collaboration with Westlake University and Zhejiang University.

Paper (arXiv)GitHub

Outputs 2

Baichuan-Omni Technical Report

paper

Technical report describing the omni-modal architecture, two-stage multimodal training schema, and evaluation results.

Paper (arXiv)

arXiv HTML

Baichuan-Omni (model)

model 2024-10-15

Open-source 7B multimodal model for image, video, audio, and text understanding.

GitHub

Architecture DENSE

Parameters 7B

open-weightmultimodalaudiovision