Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

arXiv:2604.18701v3 Announce Type: replace-cross Abstract: Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it admits a tractable per-step surrogate: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this error baseline online with a learned critic co-trained alo
The paper introduces a novel intrinsic reward mechanism for world model training, addressing a fundamental challenge in creating more capable and autonomous AI systems.
This research contributes to improving the efficiency and effectiveness of training AI agents, which is crucial for advancing autonomous systems across various applications.
The method of training world models could become more robust and scalable by incorporating cumulative prediction error improvement as an intrinsic reward.
- · AI research institutions
- · Developers of autonomous agents
- · Robotics companies
- · AI development relying solely on less efficient reward mechanisms
More efficient and capable AI world models will emerge.
Advanced autonomous AI agents, including general-purpose ones, will become more feasible.
The development of highly autonomous systems could accelerate the adoption of AI agents in complex environments.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI