Researchers have introduced ReceiptBench, a new benchmark designed to evaluate Multimodal Large Language Models (MLLMs) on understanding real-world documents like receipts. The benchmark includes 10,000 diverse receipts and is structured into four hierarchical tasks, ranging from basic text spotting to complex structure parsing and semantic reasoning. To improve MLLM performance on these tasks, a novel two-stage training framework called Metric-Aware Group Relative Policy Optimization (GRPO) was developed, which uses evaluation metrics as reinforcement learning signals for enhanced structural consistency. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT This benchmark and training method could lead to more robust MLLMs for business automation tasks involving document understanding.
RANK_REASON The cluster contains a research paper introducing a new benchmark and method for evaluating MLLMs.