New research highlights English bias in LLMs, calls for per-language investment

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-15 04:51

A new paper reveals that large language models are significantly biased towards English, even when fine-tuned for other languages. Researchers found that continual pre-training does not improve cultural understanding in a target language cost-effectively compared to training from scratch. This suggests that future LLM development may require dedicated investment in per-language resources rather than solely expanding English-centric ones. AI

影响 Suggests a shift towards dedicated per-language LLM development, potentially increasing costs and complexity for non-English applications.

排序理由 Academic paper analyzing LLM language bias. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

English
LLMs

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Ukyo Honda · 2026-05-15 04:51

Toward LLMs Beyond English-Centric Development

Through an analysis of sequences generated by open-weight large language models (LLMs), we demonstrate that LLMs are heavily biased toward English. While continual pre-training is commonly used to adapt LLMs to a target language, we show that it does not offer a cost advantage ov…

报道来源 [1]

Toward LLMs Beyond English-Centric Development

相关实体

相关话题