EleutherAI proposes new methods for normalizing LM evaluation scores

By PulseAugur Editorial · [1 sources] · 2021-10-11 15:00

EleutherAI's blog post introduces and analyzes four distinct methods for evaluating language model performance on multiple-choice tasks. These methods, including unnormalized, token-length normalized, byte-length normalized, and unconditional likelihood normalized scores, address the challenge of comparing continuations of varying lengths. The post highlights the trade-offs of each approach, particularly concerning tokenization dependence and computational requirements, with byte-length normalization emerging as a tokenization-agnostic solution. AI

RANK_REASON The item is a blog post detailing research on evaluation methodologies for language models.

Read on EleutherAI Blog →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

EleutherAI proposes new methods for normalizing LM evaluation scores

COVERAGE [1]

EleutherAI Blog TIER_1 English(EN) · 2021-10-11 15:00

Multiple Choice Normalization in LM Evaluation

There are multiple ways of evaluating multiple choice tasks on autoregressive LMs like GPT-3/Neo/J. This post lays out the current prevalent normalization methods.

COVERAGE [1]

Multiple Choice Normalization in LM Evaluation

RELATED TOPICS