Researchers have developed a new, reproducible pipeline for creating a Universal Dependencies-style parsing resource for Katharevousa Greek parliamentary text. This workflow addresses the lack of NLP tools for this historical language, crucial for understanding legal and administrative archives. The pipeline integrates OCR reconstruction, LLM-assisted annotation, and automated validation to produce a high-quality dataset, which is released openly along with the methodology and benchmark results. AI
Summary written by gemini-2.5-flash-lite from 1 sources. How we write summaries →
IMPACT Enables new NLP applications for historical Greek parliamentary archives, potentially unlocking insights from previously inaccessible texts.
RANK_REASON The cluster contains an academic paper detailing a new methodology and dataset for processing historical text, including benchmark results. [lever_c_demoted from research: ic=1 ai=1.0]