DeFrame: Debiasing Large Language Models Against Framing Effects
Researchers have introduced DeFrame, a novel method to address framing effects in large language models (LLMs). Framing disparity, which quantifies how semantically equivalent prompts can lead to biased LLM responses, was identified as a significant contributor to hidden bias. Existing debiasing techniques often fail to mitigate these framing-induced disparities, even when improving overall fairness scores. DeFrame aims to enhance LLM consistency across different prompt framings, thereby reducing both overall bias and improving robustness. AI
IMPACT Enhances LLM fairness and consistency, potentially improving user trust and reliability in deployed applications.