Researchers have introduced a novel intrinsic reward mechanism called Curiosity-Critic for training world models. This method grounds its reward in the improvement of the world model's cumulative prediction error, offering a tractable per-step surrogate. A learned critic estimates the error baseline online, guiding exploration towards learnable transitions and distinguishing between reducible and irreducible prediction errors. Experiments demonstrated that Curiosity-Critic surpasses existing methods in training speed and world model accuracy. AI
IMPACT Introduces a new intrinsic reward mechanism for world model training that improves learning speed and accuracy.
RANK_REASON This is a research paper detailing a new method for training world models.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →