Researchers have developed a novel method to mitigate biases in large language models during the decoding phase, without altering the model's weights. This approach uses a separate Process Reward Model (PRM) to score token candidates for fairness and fluency. The sequential critique-and-revise scheme proved most effective, improving bias scores by up to 0.40 while maintaining fluency. The framework was evaluated on models including GPT-4o-mini, Llama 3.2 3B, Gemma 3 4B, and Qwen 2.5 3B. AI
影响 Offers a new technique for reducing LLM bias without costly retraining, potentially making safer models more accessible.
排序理由 The cluster contains an academic paper detailing a new method for LLM bias mitigation.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →