PulseAugur
LIVE 14:50:24
research · [3 sources] ·
0
research

Researchers propose unified framework for surprisal theory in language processing

A new paper proposes a unified framework for analyzing surprisal in language processing, aiming to disentangle the definition of linguistic units from the evaluation regions. The authors argue that current empirical work often conflates these two aspects, leading to implicit reliance on ad hoc procedures. They suggest that tokenization should be treated as an implementation detail rather than a fundamental scientific primitive in surprisal-based analyses. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

IMPACT Clarifies theoretical underpinnings for evaluating language model predictability, potentially impacting future research on human-AI language interaction.

RANK_REASON This is a research paper published on arXiv concerning theoretical aspects of language processing.

Read on arXiv cs.CL →

COVERAGE [3]

  1. arXiv cs.CL TIER_1 · Samuel Kiegeland, V\'esteinn Sn{\ae}bjarnarson, Tim Vieira, Ryan Cotterell ·

    On the Proper Treatment of Units in Surprisal Theory

    arXiv:2604.28147v1 Announce Type: new Abstract: Surprisal theory links human processing effort to the predictability of an upcoming linguistic unit, but empirical work often leaves the notion of a unit underspecified. In practice, experimental stimuli are segmented into linguisti…

  2. arXiv cs.CL TIER_1 · Ryan Cotterell ·

    On the Proper Treatment of Units in Surprisal Theory

    Surprisal theory links human processing effort to the predictability of an upcoming linguistic unit, but empirical work often leaves the notion of a unit underspecified. In practice, experimental stimuli are segmented into linguistically motivated units (e.g., words), while pretr…

  3. Hugging Face Daily Papers TIER_1 ·

    On the Proper Treatment of Units in Surprisal Theory

    Surprisal theory links human processing effort to the predictability of an upcoming linguistic unit, but empirical work often leaves the notion of a unit underspecified. In practice, experimental stimuli are segmented into linguistically motivated units (e.g., words), while pretr…