A new cross-lingual audit framework has been developed to evaluate demographic bias in large language models used for emergency police dispatch. The study tested eleven frontier models across 15 scenarios in English and Mandarin Chinese, using minimal-pair designs to isolate the impact of demographic cues like religious appearance, gender, and race. Results indicate that bias is most pronounced when incident severity is ambiguous, with significant cross-lingual differences observed, particularly an amplification of gender bias in Mandarin. The framework offers a scalable method for agencies to assess LLM fairness before deployment. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Highlights potential risks of deploying LLMs in public safety and the need for rigorous, cross-lingual bias auditing.
RANK_REASON Academic paper evaluating bias in LLMs for a specific application. [lever_c_demoted from research: ic=1 ai=1.0]