PulseAugur
实时 10:02:24

Researchers quantify and mitigate socially desirable responding in LLMs

Researchers have developed a new framework to identify and reduce socially desirable responding (SDR) in large language models (LLMs) when they are evaluated using self-report questionnaires. This SDR, where models provide preferred answers rather than honest ones, can skew assessment results for persona consistency, safety, and bias. The proposed method quantifies SDR by comparing responses under honest versus fake-good instructions and uses a graded forced-choice inventory to mitigate it, showing significant reduction in SDR while preserving persona recovery. AI

影响 Introduces a method to improve the reliability of LLM evaluations, particularly for safety and bias assessments.

排序理由 Academic paper introducing a new framework for evaluating LLMs.

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

Researchers quantify and mitigate socially desirable responding in LLMs

报道来源 [1]

  1. arXiv cs.CL TIER_1 English(EN) · Kensuke Okada, Yui Furukawa, Kyosuke Bunji ·

    Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study

    arXiv:2602.17262v2 Announce Type: replace Abstract: Human self-report questionnaires are increasingly used in NLP to benchmark and audit large language models (LLMs), from persona consistency to safety and bias assessments. Yet these instruments presume honest responding; in eval…