Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 12h

When Tables Go Crazy: Evaluating Multimodal Models on French Financial Documents

A new benchmark, Scribe Finance, has been introduced to evaluate the capabilities of multimodal models in understanding complex French financial documents. The benchmark, which includes questions on text extraction, table comprehension, and chart interpretation, reveals that while current vision-language models (VLMs) perform well on text and table tasks, they struggle significantly with chart analysis. Furthermore, the study highlights a critical failure mode where initial errors in multi-turn dialogues can propagate, leading to a substantial decrease in accuracy regardless of model size. AI

IMPACT Highlights brittleness in current VLMs for complex financial analysis, indicating a need for improved chart interpretation and error propagation handling.

vision-language models
multimodal models
Théo Lasnier
French financial documents
Scribe Finance