A new research paper analyzes how large language models like Claude Opus 4, GPT-4.1, and Gemini 2.5 Pro translate math word problems across various languages and cultures. The study found that while models agree on the type of transformation for entities like names and foods, they often fail to preserve cultural diversity, instead compressing it. Models also exhibit regional misattributions and cross-cultural contamination, such as adapting Western holidays to local contexts, with these deeper failures only becoming apparent through corpus-level analysis. AI
IMPACT Reveals LLM limitations in nuanced cultural adaptation, highlighting risks for educational tools and personalized learning.
RANK_REASON The cluster contains an academic paper detailing research findings on LLM behavior. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →