AI Lab Tracker
Labs
Timeline
GEBench
dataset
2026-02-12
StepFun
5-dimensional evaluation benchmark for GUI and web agents.
Paper (arXiv)
GitHub
benchmark
agentic
Notes
arXiv paper 2602.09007.