Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 6h · [2 sources]

LLM Bias Evaluation: Gender, Racial, and Age Disparities in Occupational and Crime Scenarios

Two new research papers highlight significant gender, racial, and age biases in leading large language models. The first paper, evaluating Gemini 1.5 Pro, Llama 3 70B, Claude 3 Opus, and GPT-4o, found that debiasing efforts can paradoxically exacerbate disparities. The second paper, auditing models like Claude, GPT, Gemini, DeepSeek, Syn-Pro, and HyperCLOVA X across multiple languages, revealed that LLMs exhibit stereotyping ranges far wider than human baselines and that translation can obscure complex rearrangements of bias. AI

IMPACT These studies highlight critical fairness issues in LLMs, suggesting current debiasing methods are insufficient and complex cross-lingual biases require more nuanced solutions.

GPT-4o
Claude
Gemini
DeepSeek
Gemini 1.5 Pro
GPT
Llama 3 70B
Claude 3 Opus
Vishal Mirza
HyperCLOVA X