PulseAugur
EN
LIVE 06:17:03

New KnowBias framework enhances LLM neurons to reduce social bias

Researchers have introduced KnowBias, a novel framework designed to mitigate social biases in large language models (LLMs). Unlike traditional methods that suppress biased parameters, KnowBias selectively enhances neurons that encode bias knowledge. This approach identifies bias-encoding neurons using a small set of bias-knowledge questions and attribution-based analysis, then strengthens them during inference. Experiments show this method achieves state-of-the-art debiasing performance across various benchmarks and LLMs while preserving general capabilities and requiring minimal data and no retraining. AI

IMPACT Offers a more effective and data-efficient method for reducing harmful stereotypes in LLMs, potentially improving their safety and generalizability.

RANK_REASON The cluster contains an academic paper detailing a new method for mitigating bias in LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New KnowBias framework enhances LLM neurons to reduce social bias

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Jinhao Pan, Chahat Raj, Anjishnu Mukherjee, Sina Mansouri, Bowen Wei, Shloka Yada, Ziwei Zhu ·

    Knowing Bias, Doing Better: Mitigating Social Bias in LLMs via Know-Bias Neuron Enhancement

    arXiv:2601.21864v2 Announce Type: replace Abstract: Large language models (LLMs) exhibit social biases that reinforce harmful stereotypes, limiting their safe deployment. Most existing debiasing methods adopt a suppressive paradigm by modifying parameters, prompts, or neurons ass…