Transcription and Recognition of Italian Parliamentary Speeches Using Vision-Language Models
Researchers have developed a new pipeline using Vision-Language Models to improve the transcription and analysis of historical Italian parliamentary speeches. This approach leverages OCR for initial text extraction and then employs a large-scale Vision-Language Model to refine transcriptions, classify document elements, and identify speakers by analyzing both visual layout and text. The system also links identified speakers to a knowledge base, demonstrating significant improvements in transcription quality and speaker tagging compared to traditional methods. AI
IMPACT This research demonstrates a novel application of Vision-Language Models for historical document analysis, potentially improving accessibility and research capabilities for similar archives.