AI Lab Tracker
Labs
Timeline
ComplexFuncBench
dataset
2025-01-17
Z.ai
Benchmark for evaluating multi-step and constrained function calling under long-context scenarios. Tests LLM tool-use capabilities with complex, real-world API interactions.
Paper (arXiv)
GitHub
Dataset
GitHub Repository
benchmark
agentic
evaluation