PulseAugur
LIVE 15:29:13
research · [2 sources] ·
0
research

Researchers reflect on quantitative linguistics methods and dataset properties

This paper examines how quantitative methods in historical linguistics are influenced by dataset characteristics. It uses two case studies: one on Early Modern English discourse from the EEBO-TCP corpus (765 million words) and another on scientific writing from the Royal Society Corpus (78.6 million tokens). The analysis highlights the limitations of frequency-based approaches and emphasizes how dataset structure affects the detection of semantic change. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Explores limitations of current quantitative methods for detecting semantic change, potentially influencing future NLP research.

RANK_REASON This is a research paper published on arXiv discussing quantitative methods in historical linguistics.

Read on arXiv cs.CL →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 · Catherine Wong, Bach Phan-Tat, Susan Fitzmaurice ·

    Methods, Data, and Conceptual Change: Reflections from Two Quantitative Diachronic Case Studies

    arXiv:2605.02052v1 Announce Type: new Abstract: This discussion paper reflects on how quantitative approaches to historical linguistics interact with dataset properties. Drawing on two worked examples, we examine English data using quad-based concept modelling of Early Modern Eng…

  2. arXiv cs.CL TIER_1 · Susan Fitzmaurice ·

    Methods, Data, and Conceptual Change: Reflections from Two Quantitative Diachronic Case Studies

    This discussion paper reflects on how quantitative approaches to historical linguistics interact with dataset properties. Drawing on two worked examples, we examine English data using quad-based concept modelling of Early Modern English discourse in EEBO-TCP (c. 1470s-1690s; 765M…