4.7B parameter Vision-Language-Action (VLA) model for real-time robotic execution. Scales VLA models for consumer GPUs.

Outputs 2

Xiaomi-Robotics-0

model
Architecture DENSE
Parameters 4.7B

Xiaomi-Robotics-0: Open-Sourced VLA Model

paper

Report on scaling Vision-Language-Action models for consumer GPUs.

arXiv: 2602.12684

embodiedmultimodalopen-weight