Self-training restructures language models, research finds

作者 PulseAugur 编辑部 · [3 个来源] · 2026-05-20 01:44

A new research paper challenges the common understanding of self-training in language models, suggesting it restructures rather than flattens language. The study found that while surface-level linguistic features like discourse markers increase, deeper syntactic structures such as questions and passives decline. This "Structural Depth Hypothesis" posits that the decay rate of linguistic features is primarily determined by their structural complexity, not just their frequency in the model's output. AI

影响 Reveals that self-training alters language model outputs in complex ways, impacting data curation and LLM text detection.

排序理由 The cluster contains a research paper detailing novel findings about language model behavior.

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.AI TIER_1 English(EN) · Ming Liu · 2026-05-22 04:00

Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

arXiv:2605.20602v1 Announce Type: cross Abstract: Successive self-training on a language model's own outputs is widely characterized as a process of flattening: diversity drops, distributions narrow, and the text becomes "more like itself." We provide evidence that this character…
arXiv cs.CL TIER_1 English(EN) · Ming Liu · 2026-05-20 01:44

Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

Successive self-training on a language model's own outputs is widely characterized as a process of flattening: diversity drops, distributions narrow, and the text becomes "more like itself." We provide evidence that this characterization is incomplete. Across eleven generations o…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-20 01:44

Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

Successive self-training on a language model's own outputs is widely characterized as a process of flattening: diversity drops, distributions narrow, and the text becomes "more like itself." We provide evidence that this characterization is incomplete. Across eleven generations o…

报道来源 [3]

Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

相关实体

相关话题