PulseAugur
实时 09:28:02
English(EN) LLM Bias Evaluation: Gender, Racial, and Age Disparities in Occupational and Crime Scenarios

LLM 显示出广泛的性别、种族、年龄偏见,去偏见努力加剧了差异

两篇新研究论文强调了领先的大型语言模型中存在显著的性别、种族和年龄偏见。第一篇论文评估了 Gemini 1.5 ProLlama 3 70BClaude 3 OpusGPT-4o,发现去偏见努力可能适得其反地加剧了差异。第二篇论文审计了 ClaudeGPTGeminiDeepSeekSyn-ProHyperCLOVA X 等模型在多种语言中的表现,揭示了 LLM 表现出的刻板印象范围远远超出人类基线,并且翻译可能会掩盖偏见的复杂重排。 AI

影响 这些研究强调了 LLM 中关键的公平性问题,表明当前的去偏见方法不足,并且复杂的跨语言偏见需要更细致的解决方案。

排序理由 arXiv 上发表了两篇学术论文,提出了关于 LLM 偏见的发现。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Vishal Mirza, Rahul Kulkarni, Aakanksha Jadhav ·

    LLM Bias Evaluation: Gender, Racial, and Age Disparities in Occupational and Crime Scenarios

    arXiv:2409.14583v4 Announce Type: replace Abstract: LLM bias evaluation is critical as large language models (LLMs) increasingly influence high-stakes decisions. This paper provides a comprehensive assessment of gender, racial, and age disparities in leading LLMs, revealing that …

  2. arXiv cs.CL TIER_1 English(EN) · Jiwoo Choi, Seonwoo Ahn, Tongxin Zhang, Seohyon Jung ·

    Anchoring LLM Gender Bias to Human Baselines: A Cross-Lingual Audit

    arXiv:2605.30804v1 Announce Type: new Abstract: We audit six large language models (LLMs) for gender stereotyping across English, Korean, Chinese, and Japanese. Three were developed primarily for English-language use (Claude, GPT, Gemini) and three for East Asian use (DeepSeek, S…