A new paper proposes a unified framework for analyzing surprisal in language processing, aiming to disentangle the definition of linguistic units from the evaluation regions. The authors argue that current empirical work often conflates these two aspects, leading to implicit reliance on ad hoc procedures. They suggest that tokenization should be treated as an implementation detail rather than a fundamental scientific primitive in surprisal-based analyses. AI
Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →
IMPACT Clarifies theoretical underpinnings for evaluating language model predictability, potentially impacting future research on human-AI language interaction.
RANK_REASON This is a research paper published on arXiv concerning theoretical aspects of language processing.