5-dimensional evaluation benchmark for GUI and web agents.
benchmarkagentic

Notes

arXiv paper 2602.09007.