A new study reveals that large language models exhibit a pro-female gender bias in hiring decisions, even within a Japanese corporate context using rirekisho-format resumes. Researchers tested five state-of-the-art LLMs, including Claude Sonnet 4.6, GPT-4o, DeepSeek-V3, Gemini 2.5-Flash, and Llama 3.3-70B, across 43,200 API calls. While a prompt-level gender-neutrality instruction did not significantly reduce bias, removing candidate names from the prompt nearly eliminated the pro-female effect, identifying names as the primary gender channel. The study also noted a practical deployment challenge with GPT-4o's content safety filter causing a high refusal rate during name anonymization attempts. AI
IMPACT Highlights the need for careful LLM deployment in recruitment to avoid perpetuating gender bias, particularly concerning candidate identification.
RANK_REASON The cluster contains an academic paper detailing research findings on LLM bias.
- arXiv
- Claude Sonnet 4.6
- DeepSeek-V3
- Gemini 2.5-Flash
- GPT-4o
- Hugging Face
- Japan
- Llama 3.3-70B
- rirekisho
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →