Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 22h · [2 sources]

AgentFairBench: Do LLM Agents Discriminate When They Act?

Researchers have introduced AgentFairBench, a new benchmark designed to evaluate demographic disparities in the actions taken by large language model (LLM) agents. This benchmark, grounded in the Bias Conduction Framework, covers hiring, lending, and medical triage domains. It utilizes synthetic profiles and a methodology inspired by Bertrand Mullainathan's work to measure action-rate disparity and tool-invocation disparity, with a focus on statistical rigor to avoid overstating differences. Initial testing with Claude Haiku 4.5 showed no significant demographic effects beyond sampling noise. AI

IMPACT Provides a new tool for researchers and developers to assess and mitigate bias in LLM agents performing real-world actions.

Hugging Face
arXiv
NumPy
Claude Haiku 4.5
AgentFairBench
Bias Conduction Framework
Bertrand Mullainathan