Researchers have investigated how authorship attribution signals emerge in encoder-based language models. They found that the scoring mechanism, rather than the representation quality, significantly impacts performance, leading to up to a four-fold difference. Using mechanistic interpretability, the study revealed that different pooling and interaction strategies in scorers dictate when and where the model consolidates authorship signals, with mean pooling forcing early consolidation and late interaction deferring it. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT This research offers insights into the internal workings of language models, potentially improving the interpretability and effectiveness of authorship attribution systems.
RANK_REASON The cluster contains an academic paper detailing research into language models. [lever_c_demoted from research: ic=1 ai=1.0]