This paper examines how quantitative methods in historical linguistics are influenced by dataset characteristics. It uses two case studies: one on Early Modern English discourse from the EEBO-TCP corpus (765 million words) and another on scientific writing from the Royal Society Corpus (78.6 million tokens). The analysis highlights the limitations of frequency-based approaches and emphasizes how dataset structure affects the detection of semantic change. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Explores limitations of current quantitative methods for detecting semantic change, potentially influencing future NLP research.
RANK_REASON This is a research paper published on arXiv discussing quantitative methods in historical linguistics.