Abstract
Deep reinforcement learning has proven an effective method to solve many intricate tasks, yet it still struggles with data efficiency and generalization to novel scenarios, as required in settings such as robotics. Recent approaches to deal with this include (1) unsupervised pretraining of the agent in an environment without reward signals, and (2) training the agent using offline data coming from various possible sources. In this letter we propose to consider both of these approaches together, resulting in a setting where different types of data streams are available and fast online adaptation to new tasks is required. Towards this goal we consider the Unsupervised Reinforcement Learning Benchmark and show that unsupervised training’s primary value lies in its use as a source of exploration trajectories, beyond its role in pretraining a policy. Following this observation we develop a method based on a world-model as a generative model of offline exploration data and model predictive control (MPC) planning. We show that this approach outperforms previous methods and demonstrates task adaptation which is 10 times faster than previously shown. We then propose a setup that includes access to both unsupervised exploratory data and offline expert demonstrations when testing the agents’ online performance on adaptation to novel tasks in the environment.
| Original language | English |
|---|---|
| Pages (from-to) | 5693-5700 |
| Number of pages | 8 |
| Journal | IEEE Robotics and Automation Letters |
| Volume | 11 |
| Issue number | 5 |
| DOIs | |
| State | Published - 1 May 2026 |
Bibliographical note
Publisher Copyright:© 2016 IEEE.
Keywords
- Reinforcement learning
- model predictive control (MPC)
- unsupervised learning
ASJC Scopus subject areas
- Control and Systems Engineering
- Biomedical Engineering
- Human-Computer Interaction
- Mechanical Engineering
- Computer Vision and Pattern Recognition
- Computer Science Applications
- Control and Optimization
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Unifying Unsupervised and Offline RL for Fast Adaptation Using World Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver