A new study published on arXiv reveals that geopolitical biases in large language models primarily stem from the post-training alignment phase, rather than the initial training data. Researchers tested seven LLM pairs, finding that six exhibited biases favoring their developer's region after post-training. This effect was particularly pronounced in Alibaba's Qwen 2.5, which showed an 18-fold increase in China-favorability odds post-training. The study also noted that the language used in prompts can amplify these biases, as seen with the French-made Mistral model becoming pro-France only when prompted in French. AI
IMPACT Highlights that LLM alignment processes, not just raw data, shape geopolitical biases, necessitating greater transparency in model development.
RANK_REASON Academic paper detailing novel findings about LLM behavior.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →