Human Psychometric Questionnaires Mischaracterize LLM Behavior
Research indicates that traditional psychometric self-report questionnaires, like the Big-5 personality framework, are not reliable predictors of Large Language Model (LLM) behavior. Studies suggest that more specific, behavior-oriented frameworks, such as the Theory of Planned Behavior, can achieve human-level coherence with LLM responses, but only under certain conditions like shared conversational contexts. Furthermore, an LLM-native psychometric instrument derived from behavioral affordances also failed to predict LLM behavior, highlighting potential confounds in LLM self-reporting and the limitations of current evaluation methods. AI
IMPACT Current psychometric evaluation methods for LLMs are insufficient, necessitating the development of more robust and behavior-specific assessment tools for safe deployment.