Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 1w · [2 sources]

BigFinanceBench: A Workflow-Grounded Benchmark for Financial-Research Agents

Researchers have introduced BigFinanceBench, a new benchmark designed to evaluate the auditable derivation of financial research answers. This benchmark includes 928 expert-authored tasks with detailed rubrics to assess the full workflow, not just the final output. Initial evaluations of ten leading AI agents showed that the best performer achieved only 58.8% of the rubric score, indicating significant room for improvement in financial research capabilities. AI

IMPACT This benchmark will drive development of more transparent and auditable AI agents for financial research.

arXiv
BigFinanceBench
AI agents