PulseAugur
LIVE 23:29:36
research · [2 sources] ·

New benchmark ReceiptBench evaluates MLLMs on document understanding

Researchers have introduced ReceiptBench, a new benchmark designed to evaluate Multimodal Large Language Models (MLLMs) on understanding real-world documents like receipts. The benchmark includes 10,000 diverse receipts and is structured into four hierarchical tasks, ranging from basic text spotting to complex structure parsing and semantic reasoning. To improve MLLM performance on these tasks, a novel two-stage training framework called Metric-Aware Group Relative Policy Optimization (GRPO) was developed, which uses evaluation metrics as reinforcement learning signals for enhanced structural consistency. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT This benchmark and training method could lead to more robust MLLMs for business automation tasks involving document understanding.

RANK_REASON The cluster contains a research paper introducing a new benchmark and method for evaluating MLLMs.

Read on arXiv cs.CV →

COVERAGE [2]

  1. arXiv cs.CV TIER_1 · Yandi Wang, Libin Zhan, Ziwei Huang, Tiancheng Luo, Yuxuan Jiang, Wang Dong, Leilei Gan, Jun Chen ·

    From Recognition to Reasoning: Benchmarking and Enhancing MLLMs on Real-World Receipt Document Understanding

    arXiv:2605.22413v1 Announce Type: new Abstract: Extracting structured information from visual documents (Visual Information Extraction, VIE) is a cornerstone of business automation. While recent Multimodal Large Language Models (MLLMs) have shown promising capabilities, existing …

  2. arXiv cs.CV TIER_1 · Jun Chen ·

    From Recognition to Reasoning: Benchmarking and Enhancing MLLMs on Real-World Receipt Document Understanding

    Extracting structured information from visual documents (Visual Information Extraction, VIE) is a cornerstone of business automation. While recent Multimodal Large Language Models (MLLMs) have shown promising capabilities, existing benchmarks suffer from critical limitations in s…