Curiosity-Critic reward improves world model training accuracy

By PulseAugur Editorial · [1 sources] · 2026-04-30 04:00

Researchers have introduced a novel intrinsic reward mechanism called Curiosity-Critic for training world models. This method grounds its reward in the improvement of the world model's cumulative prediction error, offering a tractable per-step surrogate. A learned critic estimates the error baseline online, guiding exploration towards learnable transitions and distinguishing between reducible and irreducible prediction errors. Experiments demonstrated that Curiosity-Critic surpasses existing methods in training speed and world model accuracy. AI

IMPACT Introduces a new intrinsic reward mechanism for world model training that improves learning speed and accuracy.

RANK_REASON This is a research paper detailing a new method for training world models.

Read on arXiv stat.ML →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv stat.ML TIER_1 English(EN) · Vin Bhaskara, Haicheng Wang · 2026-04-30 04:00

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

arXiv:2604.18701v2 Announce Type: replace-cross Abstract: Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds …

COVERAGE [1]

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

RELATED ENTITIES

RELATED TOPICS