Authorship signal emergence in encoder language models studied

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have investigated how authorship attribution signals emerge in encoder-based language models. They found that the scoring mechanism, rather than the representation quality, significantly impacts performance, leading to up to a four-fold difference. Using mechanistic interpretability, the study revealed that different pooling and interaction strategies in scorers dictate when and where the model consolidates authorship signals, with mean pooling forcing early consolidation and late interaction deferring it. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT This research offers insights into the internal workings of language models, potentially improving the interpretability and effectiveness of authorship attribution systems.

RANK_REASON The cluster contains an academic paper detailing research into language models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
other

COVERAGE [1]

arXiv cs.CL TIER_1 · Florian Cafiero · 2026-05-19 14:37

Where Does Authorship Signal Emerge in Encoder-Based Language Models?

Authorship attribution models fine-tuned with the same pretrained encoder, data, and loss can differ four-fold in performance depending only on their scoring mechanism. We use mechanistic interpretability tools to explain this gap. Stylistic features such as word length, punctuat…

COVERAGE [1]

Where Does Authorship Signal Emerge in Encoder-Based Language Models?

RELATED TOPICS