Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 3d

Transcription and Recognition of Italian Parliamentary Speeches Using Vision-Language Models

Researchers have developed a new pipeline using Vision-Language Models to improve the transcription and analysis of historical Italian parliamentary speeches. This approach leverages OCR for initial text extraction and then employs a large-scale Vision-Language Model to refine transcriptions, classify document elements, and identify speakers by analyzing both visual layout and text. The system also links identified speakers to a knowledge base, demonstrating significant improvements in transcription quality and speaker tagging compared to traditional methods. AI

IMPACT This research demonstrates a novel application of Vision-Language Models for historical document analysis, potentially improving accessibility and research capabilities for similar archives.

Vision-Language Models
Chamber of Deputies
OCR
Italian parliamentary speeches
Sergio Picascia