PulseAugur
实时 22:21:27
English(EN) It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt

研究发现:LLM 地缘政治偏见源于训练后阶段,而非数据

一篇新发表在 arXiv 上的研究揭示,大型语言模型 (LLM) 中的地缘政治偏见主要源于训练后对齐阶段,而非初始训练数据。研究人员测试了七对 LLM,发现其中六对在训练后表现出偏袒其开发者所在地区的偏见。这种效应在阿里巴巴的 Qwen 2.5 中尤为明显,其训练后偏袒中国的几率增加了 18 倍。研究还指出,提示所使用的语言会放大这些偏见,例如法国制造的 Mistral 模型仅在用法文提示时才表现出亲法倾向。 AI

影响 强调 LLM 的对齐过程,而不仅仅是原始数据,塑造了地缘政治偏见,这需要提高模型开发的透明度。

排序理由 学术论文,详细介绍了关于 LLM 行为的新发现。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Stuart Bladon, Brinnae Bent ·

    It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt

    arXiv:2605.23825v1 Announce Type: cross Abstract: It has generally been assumed that geopolitical bias in language models originates from the training data used during the pre-training phase. We tested seven open-weight LLM pairs consisting of the base model (pre-training only) a…

  2. arXiv cs.AI TIER_1 English(EN) · Brinnae Bent ·

    It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt

    It has generally been assumed that geopolitical bias in language models originates from the training data used during the pre-training phase. We tested seven open-weight LLM pairs consisting of the base model (pre-training only) and the chat model (pre-training and post-training)…