This paper introduces EPIC-EuroParl-UdS, an updated corpus of European Parliament speeches and their translations/interpretations. The resource has been refined with corrected metadata, improved linguistic annotations, and new layers like word alignment and surprisal indices. It supports research into information-theoretic approaches to language variation, comparing written and spoken modes, and analyzing translationese. A new study within the paper validates the spoken data and evaluates GPT-2 and machine translation models on predicting filler particles in interpreting. AI
IMPACT Provides a refined dataset for research into information-theoretic approaches in language, potentially improving machine translation and interpreting models.
RANK_REASON The item is a research paper detailing a new corpus and its application. [lever_c_demoted from research: ic=1 ai=1.0]
- alphaXiv
- arXiv
- CatalyzeX
- DagsHub
- EPIC-EuroParl-UdS
- European Parliament
- Gotit.pub
- GPT-2
- Hugging Face
- Maria Kunilovskaya
- ScienceCast
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →