IBM Research and UC Berkeley have developed IT-Bench, a new benchmark designed to evaluate the performance of enterprise AI agents. They also introduced MAST, a framework for diagnosing the root causes of agent failures. This work aims to improve the reliability and effectiveness of AI agents in business environments by identifying specific areas where they struggle. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON The release of a new benchmark and diagnostic framework for AI agents constitutes a research contribution.