PulseAugur
EN
LIVE 12:24:26

Language model filters cause epistemic injustice, study finds

A new research paper published on arXiv details how pretraining filters and guardrails in language models can lead to epistemic injustice. The audit found that these systems disproportionately flag content related to marginalized groups, such as transgender people, women, and Central Americans, while often failing to detect explicit hate speech or private information. Human annotators would have retained a significant majority of the content flagged by these automated systems, highlighting a gap in their ability to capture nuanced representational harms. AI

IMPACT Reveals how current content moderation systems in LLMs can inadvertently silence marginalized voices, necessitating more nuanced approaches to AI safety.

RANK_REASON The cluster contains an academic paper detailing research findings on language model safety and bias.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Marco Antonio Stranisci, A Pranav, Rossana Damiano, Christian Hardmeier, Anne Lauscher ·

    Epistemic Injustice in Language Models: An Audit of Pretraining Filters and Guardrails

    arXiv:2606.05936v1 Announce Type: new Abstract: Modern language models rely on pretraining filters to remove undesirable content from training corpora and inference-time guardrails to suppress undesirable outputs during deployment. In this paper, we examine how these filtering an…

  2. arXiv cs.CL TIER_1 English(EN) · Anne Lauscher ·

    Epistemic Injustice in Language Models: An Audit of Pretraining Filters and Guardrails

    Modern language models rely on pretraining filters to remove undesirable content from training corpora and inference-time guardrails to suppress undesirable outputs during deployment. In this paper, we examine how these filtering and moderation decisions produce forms of epistemi…