Reducing Political Manipulation with Consistency Training
Researchers have developed a new training method called Political Consistency Training (PCT) to address systematic political bias in large language models. This method uses two metrics, Sentiment Consistency and Helpfulness Consistency, to measure and reduce asymmetric rhetoric and engagement across opposing political prompts. Experiments show that PCT significantly reduces covert political bias while maintaining overall model helpfulness and generalizing to new benchmarks. AI
IMPACT Introduces a novel training technique to mitigate political bias in LLMs, potentially improving their fairness and reliability.