Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 1w

Reinforcement Learning from Cross-domain Videos with Video Prediction Model

Researchers have developed XIPER, a novel reward model designed to enable reinforcement learning from expert videos across visually distinct domains. XIPER addresses challenges posed by domain gaps and the absence of explicit reward signals by training a cross-domain video prediction model. This model maps agent observations into the expert domain, utilizing prediction likelihood as a reward signal. Experiments demonstrated XIPER's effectiveness in outperforming baseline methods on tasks with significant visual differences, including sim-to-real transfer scenarios. AI

IMPACT This method could improve the efficiency and applicability of reinforcement learning agents in real-world scenarios with visual domain shifts.

Reinforcement Learning
DMC Color Suite
DMC Body Suite