A new benchmark backed by Mercor shows AI agents struggle with real workplace tasks
A new benchmark called APEX-Agents reveals that leading AI models still cannot reliably perform complex tasks requiring context-switching and multi-domain reasoning. APEX-Agents tests performance on tasks simulating real-world scenarios, with no tested model surpassing 24% accuracy in the benchmark.