Two Wrongs, No Right: Auditing Social-Desirability Bias in LLM Annotators for Computational Social Science
A new paper investigates social-desirability bias in LLM annotators used for computational social science. Researchers found that three open-source models (Zephyr, Mistral-Instruct, and Qwen2.5-Instruct) exhibit different types of bias, such as leniency or overcorrection in labeling harmful content. The study also revealed that common prompting techniques do not effectively mitigate these biases and can sometimes exacerbate them, highlighting the need for more robust validation methods in CSS research. AI