PulseAugur / Brief
EN
LIVE 07:11:34

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. PorTEXTO: A European Portuguese Benchmark for Visual Text Extraction

    Researchers have introduced PorTEXTO, a new benchmark designed to improve visual text extraction for European Portuguese (pt-PT). This benchmark addresses the scarcity of resources for pt-PT in existing optical character recognition (OCR) benchmarks, which often focus on high-resource languages or historical texts. PorTEXTO utilizes a pipeline that combines transcriptions from a large language model with human review by native speakers to ensure quality and relevance for contemporary applications. The study found that specialized multilingual data is more effective for pt-PT OCR performance than model size or resolution, highlighting the need for open pt-PT OCR resources. AI

    IMPACT This benchmark could improve AI model performance for European Portuguese text extraction, enabling better applications in regions where this language is spoken.