Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 3d · [2 sources]

Epistemic Injustice in Language Models: An Audit of Pretraining Filters and Guardrails

A new research paper published on arXiv details how pretraining filters and guardrails in language models can lead to epistemic injustice. The audit found that these systems disproportionately flag content related to marginalized groups, such as transgender people, women, and Central Americans, while often failing to detect explicit hate speech or private information. Human annotators would have retained a significant majority of the content flagged by these automated systems, highlighting a gap in their ability to capture nuanced representational harms. AI

IMPACT Reveals how current content moderation systems in LLMs can inadvertently silence marginalized voices, necessitating more nuanced approaches to AI safety.

Common Crawl
Language models
Marco Antonio Stranisci
Central Americans
Epistemic injustice
Pretraining filters