Implementation of large network design in RL. Easy switch between toy tasks and challenging games. Mainly follow three recent papers:
- 2020 ICML Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?
- 2020 NeurIPS Workshop D2RL: Deep Dense Architectures in Reinforcement Learning
- 2021 Arxiv Training Larger Networks for Deep Reinforcement Learning
In the code, we denote the method in Can Increasing Input Dimensionality Improve Deep Reinforcement Learning? as ofe, the method in D2RL: Deep Dense Architectures in Reinforcement Learning as d2rl, and the method in Training Larger Networks for Deep Reinforcement Learning as ofe_dense. It is noteworthing that we only implement single-machine approach for ofe_dense, and we observe the overfitting phenomenon. We speculate that this is because the single-machine version is not as stable as the distributed approach.
| algorithm | continuous control | on-policy / off-policy |
|---|---|---|
| Proximal Policy Optimization (PPO) coupled with d2rl | ✅ | on-policy |
| Deep Deterministic Policy Gradients (DDPG) coupled with d2rl | ✅ | off-policy |
| Deep Deterministic Policy Gradients (DDPG) coupled with ofe | ✅ | off-policy |
| Deep Deterministic Policy Gradients (DDPG) coupled with ofe_dense | ✅ | off-policy |
| Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with d2rl | ✅ | off-policy |
| Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with ofe | ✅ | off-policy |
| Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with ofe_dense | ✅ | off-policy |
| Soft Actor-Critic (SAC) coupled with d2rl | ✅ | off-policy |
| Soft Actor-Critic (SAC) coupled with ofe | ✅ | off-policy |
| Soft Actor-Critic (SAC) coupled with ofe_dense | ✅ | off-policy |
# python 3.6 (apt)
# pytorch 1.4.0 (pip)
# tensorflow 1.14.0 (pip)
# DMC Control Suite and MuJoCo
cd dockerfiles
docker build . -t rl-dockerFor other dockerfiles, you can go to RL Dockefiles.
Run with the scripts batch_run_main_d2rl_4seed_cuda.sh / batch_run_main_ofe_4seed_cuda.sh / batch_run_main_ofe_dense_4seed_cuda.sh / batch_run_ppo_d2rl_4seed_cuda.sh:
# eg.
bash batch_run_main_ofe_4seed_cuda.sh Ant-v2 TD3_ofe 0 True # env_name: Ant-v2, algorithm: TD3_ofe, CUDA_Num: 0, layer_norm: True
bash batch_run_ppo_d2rl_4seed_cuda.sh Ant-v2 PPO_d2rl 0 # env_name: Ant-v2, algorithm: PPO_d2rl, CUDA_Num: 0# eg. Notice: `-l` denotes labels, `data/DDPG-Hopper-v2/` represents the collecting dataset,
# and `-s` represents smoothing value.
python spinupUtils/plot.py \
data/DDPG_ofe-Hopper-v2/ \
-l DDPG_ofe -s 10Including Ant-v2, HalfCheetah-v2, Hopper-v2, Humanoid-v2, Walker2d-v2.
@misc{QingLi2021larger,
author = {Qing Li},
title = {Deeper and Larger Network Design for Continous Control in RL},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/LQNew/Deeper_Larger_Actor-Critic_RL}}
}


