Brief · PulseAugur

TOOL · LessWrong (AI tag) English(EN) · 8h

A preliminary experiment regarding consistency as a measure of conceptual abilities in language models

Caspar Oesterheld has conducted a preliminary experiment exploring the use of consistency across different questions as a measure of philosophical competence in language models. The hope is that consistency can serve as a reliable and scalable reward signal for training models in conceptual domains where direct evaluation is difficult. The experiment involved creating simple rewrites of critiques from the LMCA dataset and correlating model responses to these variations. AI

IMPACT This research explores a novel method for evaluating and potentially training LLMs in complex conceptual domains, offering a new signal for AI development.

Less Wrong
Caspar Oesterheld