OpenAI's TruthfulQA benchmark reveals larger models are less truthful

By PulseAugur Editorial · [1 sources] · 2021-09-08 07:00

OpenAI has introduced TruthfulQA, a new benchmark designed to evaluate how well language models avoid generating false information. The benchmark consists of 817 questions across 38 categories, specifically designed to elicit false answers based on common human misconceptions. Early tests showed that even the best-performing models were truthful on only 58% of questions, significantly lower than the 94% achieved by humans, and larger models tended to be less truthful, suggesting that simply scaling up models may not improve their accuracy. AI

RANK_REASON OpenAI published a research paper introducing a new benchmark for evaluating model truthfulness.

Read on OpenAI News →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

OpenAI's TruthfulQA benchmark reveals larger models are less truthful

COVERAGE [1]

OpenAI News TIER_1 English(EN) · 2021-09-08 07:00

TruthfulQA: Measuring how models mimic human falsehoods

COVERAGE [1]

TruthfulQA: Measuring how models mimic human falsehoods

RELATED TOPICS