MotionLLM

Multi-modal human behavior understanding leveraging LLMs for both video and motion (SMPL) modalities. Introduces MoVid dataset and MoVid-Bench for human behavior evaluation. Achieves superior performance in caption, spatial-temporal comprehension, and reasoning.

Paper (arXiv)GitHub

Outputs 2

model

LLM-based motion understanding model for human behavior captioning, comprehension, and reasoning from video and motion sequences.

GitHub

GitHub Repository →

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

paper

Framework for joint video-motion human behavior understanding using LLMs with unified training strategy and MoVid benchmark.

Paper (arXiv)

Citations 5

arXiv HTML

visionnlpmultimodalmotionopen-source