optical character recognition
PulseAugur coverage of optical character recognition — every cluster mentioning optical character recognition across labs, papers, and developer communities, ranked by signal.
3 天有情绪数据
-
Large multimodal models show mixed results for medical image PHI detection
Researchers evaluated large multimodal models (LMMs) like GPT-4o and Gemini 2.5 Flash for detecting protected health information (PHI) in medical images. While LMMs showed improved text recognition (lower Word Error Rat…
-
Vision-Language Models enhance Italian parliamentary speech analysis
Researchers have developed a new pipeline using Vision-Language Models to improve the transcription and analysis of historical Italian parliamentary speeches. This approach leverages OCR for initial text extraction and …
-
AI automates healthcare data to improve clinical decision support
Modern healthcare faces a data liquidity problem, where a significant portion of patient information remains trapped in unstructured formats like scanned documents and free-text notes. This necessitates manual data entr…
-
AI logging gaps trigger $1.5M HIPAA fine for hospital
Healthcare organizations are facing significant HIPAA violations due to inadequate logging of AI system activity, leading to substantial fines. A recent case involved a hospital settling for $1.5 million because its AI …
-
Consensus Entropy improves VLM OCR accuracy by measuring inter-model agreement
Researchers have developed a new metric called Consensus Entropy (CE) to assess the reliability of Optical Character Recognition (OCR) outputs from Vision-Language Models (VLMs). CE measures the agreement between multip…
-
New CC-OCR V2 benchmark reveals LMMs fall short in real-world document processing
A new benchmark, CC-OCR V2, has been released to evaluate Large Multimodal Models (LMMs) on real-world document processing tasks. The benchmark includes 7,093 challenging samples across five OCR-centric tracks, addressi…
-
AI classifies historical document pages for tailored content processing
Researchers have developed an AI-powered image classification system to automatically categorize pages from historical documents. This system aims to streamline the processing of digitized archives by identifying differ…
-
New OCR benchmark reveals accuracy doesn't guarantee RAG performance
A new benchmark has been developed to evaluate the robustness of Optical Character Recognition (OCR) systems specifically for Retrieval-Augmented Generation (RAG) applications. Current OCR benchmarks using character-lev…
-
Sun Finance boosts ID verification accuracy with generative AI on AWS
Sun Finance, a Latvian fintech company, has successfully automated its identity document extraction and fraud detection processes using generative AI on Amazon Web Services (AWS). The new system, developed in partnershi…
-
Researchers release dataset of AI-generated images from GPT-Image-2's first week
Researchers have released a dataset of over 10,000 images generated by OpenAI's GPT-Image-2, collected in the first week following its April 21, 2026 release. The dataset, sourced from Twitter/X, was curated using a mul…
-
iWatchRoad system uses YOLO to detect and map potholes for smart cities
Researchers have developed iWatchRoad, an end-to-end system designed for the scalable detection and geospatial visualization of potholes. The system utilizes a fine-tuned YOLO model for real-time pothole identification …
-
New dataset and methods tackle low-light scene text recognition challenges
Researchers have introduced LSTR, a large-scale dataset for low-light scene text recognition, and ESTR, a smaller evaluation set of real nighttime street scenes. They explored two approaches: fine-tuning existing OCR mo…
-
HalalBench benchmark tackles OCR challenges for multilingual food packaging ingredient extraction
Researchers have introduced HalalBench, a new multilingual benchmark designed to evaluate Optical Character Recognition (OCR) performance specifically on food packaging ingredient labels. The benchmark addresses the uni…
-
Older, cheaper LLMs often match premium OCR accuracy at lower cost
Researchers have open-sourced a new benchmark and framework for evaluating Optical Character Recognition (OCR) performance across 18 different large language models (LLMs). Their analysis, involving over 7,500 calls, re…