This is a step-by-step tutorial from DQN to Rainbow. Every chapter contains both of theoretical backgrounds and object-oriented implementation. Just pick any topic in which you are interested, and learn! You can run them directly in the cloud with molab — no local setup needed.
Built with marimo — a reactive Python notebook that runs as a pure .py file with better reproducibility, git diffs, and interactive UI.
Please feel free to open an issue or a pull-request if you have any idea to make it better. :)
If you want a tutorial for policy gradient methods, please see PG is All You Need.
- DQN [GitHub] [Preview]
- DoubleDQN [GitHub] [Preview]
- PrioritizedExperienceReplay [GitHub] [Preview]
- DuelingNet [GitHub] [Preview]
- NoisyNet [GitHub] [Preview]
- CategoricalDQN [GitHub] [Preview]
- N-stepLearning [GitHub] [Preview]
- Rainbow [GitHub] [Preview]
- Rainbow IQN [GitHub] [Preview]
Click "Run in molab" on the preview page to open an interactive session where you can edit and run the notebook.
# Install mise
curl https://mise.run | sh
# Clone the project
git clone https://github.com/Curt-Park/rainbow-is-all-you-need.git
cd rainbow-is-all-you-need
# Install Python + Create venv + Install Python packages
make init
make setupRun and experiment with any notebook:
make run notebook=01_dqn.py
make format # run the formatter
make lint # run the linter
- V. Mnih et al., "Human-level control through deep reinforcement learning." Nature, 518 (7540):529–533, 2015.
- van Hasselt et al., "Deep Reinforcement Learning with Double Q-learning." arXiv preprint arXiv:1509.06461, 2015.
- T. Schaul et al., "Prioritized Experience Replay." arXiv preprint arXiv:1511.05952, 2015.
- Z. Wang et al., "Dueling Network Architectures for Deep Reinforcement Learning." arXiv preprint arXiv:1511.06581, 2015.
- M. Fortunato et al., "Noisy Networks for Exploration." arXiv preprint arXiv:1706.10295, 2017.
- M. G. Bellemare et al., "A Distributional Perspective on Reinforcement Learning." arXiv preprint arXiv:1707.06887, 2017.
- R. S. Sutton, "Learning to predict by the methods of temporal differences." Machine learning, 3(1):9–44, 1988.
- M. Hessel et al., "Rainbow: Combining Improvements in Deep Reinforcement Learning." arXiv preprint arXiv:1710.02298, 2017.
- W. Dabney et al., "Implicit Quantile Networks for Distributional Reinforcement Learning." arXiv preprint arXiv:1806.06923, 2018.
Thanks goes to these wonderful people (emoji key):
Jinwoo Park (Curt) 💻 📖 |
Kyunghwan Kim 💻 |
Wei Chen 🚧 |
WANG Lei 🚧 |
leeyaf 💻 |
ahmadF 📖 |
Roberto Schiavone 💻 |
David Yuan 💻 |
dhanushka2001 💻 |
Pierre Couy 💻 |
Claude 💻 |
This project follows the all-contributors specification. Contributions of any kind welcome!