Brief · PulseAugur

RESEARCH · arXiv cs.CV English(EN) · 5d · [2 sources]

From Recognition to Reasoning: Benchmarking and Enhancing MLLMs on Real-World Receipt Document Understanding

Researchers have introduced ReceiptBench, a new benchmark designed to evaluate Multimodal Large Language Models (MLLMs) on understanding real-world documents like receipts. The benchmark includes 10,000 diverse receipts and is structured into four hierarchical tasks, ranging from basic text spotting to complex structure parsing and semantic reasoning. To improve MLLM performance on these tasks, a novel two-stage training framework called Metric-Aware Group Relative Policy Optimization (GRPO) was developed, which uses evaluation metrics as reinforcement learning signals for enhanced structural consistency. AI

IMPACT This benchmark and training method could lead to more robust MLLMs for business automation tasks involving document understanding.

Multimodal Large Language Models
ReceiptBench
Metric-Aware Group Relative Policy Optimization