Metis: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

Addresses when agentic multimodal models should use external tools versus relying on internal knowledge. Proposes HDPO (Hybrid Decoupled Policy Optimization), which separates accuracy optimization from tool efficiency into independent channels — avoiding the single weighted-objective tradeoff that makes agents either tool-dependent or tool-avoidant.

Uses a curriculum learning progression: agents first master task resolution, then develop self-reliance. The resulting Metis model reduces tool invocations by orders of magnitude while simultaneously improving reasoning accuracy. A practical contribution for making agentic systems cheaper and faster by eliminating unnecessary tool calls. By the Accio Team at Alibaba Group + HUST.

Paper (arXiv)Project Page

Paper

arXiv HTML

agenticmultimodalefficiencyalignment