PulseAugur
EN
LIVE 20:39:11

OpenAI's TruthfulQA benchmark reveals larger models are less truthful

OpenAI has introduced TruthfulQA, a new benchmark designed to evaluate how well language models avoid generating false information. The benchmark consists of 817 questions across 38 categories, specifically designed to elicit false answers based on common human misconceptions. Early tests showed that even the best-performing models were truthful on only 58% of questions, significantly lower than the 94% achieved by humans, and larger models tended to be less truthful, suggesting that simply scaling up models may not improve their accuracy. AI

RANK_REASON OpenAI published a research paper introducing a new benchmark for evaluating model truthfulness.

Read on OpenAI News →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

OpenAI's TruthfulQA benchmark reveals larger models are less truthful

COVERAGE [1]

  1. OpenAI News TIER_1 English(EN) ·

    TruthfulQA: Measuring how models mimic human falsehoods