English(EN) Who and What? Using Linguistic Features and Annotator Characteristics to Analyze Annotation Variation

研究揭示语言线索和标注者态度影响有害语言检测。

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-07 14:18

一篇新论文分析了NLP数据集中的标注变异，重点关注有害语言检测。该研究结合了标注者特征和数据语言属性，以理解标签差异。研究结果表明，标注者特征和项目特征之间的相互作用，特别是词汇线索和标注者态度至关重要，但不同数据集的模式差异很大，因此应警惕过度概括。 AI

影响强调了考虑标注者和数据特征对于可靠的NLP模型训练的重要性。

排序理由该集群包含一篇在arXiv上发表的学术论文。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Maximilian Maurer, Maximilian Linde, Gabriella Lapesa · 2026-05-08 04:00

谁和什么？利用语言特征和标注者特征分析标注变异

arXiv:2605.06318v1 Announce Type: new Abstract: Human label variation has been established as a central phenomenon in NLP: the perspectives different annotators have on the same item need to be embraced. Data collection practices thus shifted towards increasing the annotator numb…
arXiv cs.CL TIER_1 English(EN) · Gabriella Lapesa · 2026-05-07 14:18

谁和什么？利用语言特征和标注者特征分析标注变异

Human label variation has been established as a central phenomenon in NLP: the perspectives different annotators have on the same item need to be embraced. Data collection practices thus shifted towards increasing the annotator numbers and releasing disaggregated datasets, harmfu…