Multi-modal human behavior understanding leveraging LLMs for both video and motion (SMPL) modalities. Introduces MoVid dataset and MoVid-Bench for human behavior evaluation. Achieves superior performance in caption, spatial-temporal comprehension, and reasoning.

Outputs 2

MotionLLM

model

LLM-based motion understanding model for human behavior captioning, comprehension, and reasoning from video and motion sequences.

GitHub Repository

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

paper

Framework for joint video-motion human behavior understanding using LLMs with unified training strategy and MoVid benchmark.

arXiv: 2405.20340

visionnlpmultimodalmotionopen-source