Medical-enhanced reasoning model built upon Qwen2.5-32B with an innovative Large Verifier System comprising a Patient Simulator and Clinical Rubrics Generator. Trained through multi-stage reinforcement learning with improved GRPO. Outperforms all other open-source models and most closed-source counterparts on HealthBench. Licensed under Apache 2.0.

Outputs 2

Baichuan-M2: Scaling Medical Capability with Large Verifier System

paper

Technical report on the Large Verifier System, multi-stage RL training, and HealthBench evaluation results.

arXiv: 2509.02208

Baichuan-M2-32B

model

32-billion-parameter medical reasoning model with quantized GPTQ-Int4 variant available.

Architecture DENSE
Parameters 32B
open-weightbiologyreasoningtraining