【免费】强化学习算法合集（DQN、DDPG、SAC、TD3、MADDPG、QMIX等等）

共302个文件

py：219个

md：28个

txt：20个

强化学习

人工智能

MADDPG

需积分: 0 22 浏览量更新于2023-01-30 13 收藏 17.37MB ZIP 举报

强化学习算法合集（DQN、DDPG、SAC、TD3、MADDPG、QMIX等等）内涵20+强化学习经典算法代码。对应使用教程什么的参考博客：多智能体（前沿算法+原理） https://blog.csdn.net/sinat_39620217/article/details/115299073?spm=1001.2014.3001.5502 强化学习基础篇（单智能体算法） https://blog.csdn.net/sinat_39620217/category_10940146.html

收起资源包目录

强化学习算法合集（DQN、DDPG、SAC、TD3、MADDPG、QMIX等等）（302个子文件）

model.ckpt 4.63MB

demo.gif 4.58MB

Lane_bend.gif 3.19MB

performance.gif 782KB

performance.gif 238KB

test.ipynb 2KB

test.ipynb 1KB

l2rpn.jpeg 69KB

cartpole.jpg 110KB

README.md 7KB

README.md 3KB

README.md 2KB

README.md 1KB

README.md 849B

README.md 718B

README.md 700B

README.md 659B

README.md 435B

README.md 334B

pelvisBasedObs_scaler.npz 4KB

official_obs_scaler.npz 2KB

last course.png 360KB

fastest.png 271KB

Dueling DQN.png 218KB

result_a2c_paddle1.png 203KB

result_a2c_paddle0.png 193KB

competition.png 185KB

curriculum-learning.png 158KB

carla_sac.png 142KB

paddle2.0_qmix_result.png 97KB

perfect_moves_rate.png 64KB

good_moves_rate.png 60KB

velocity_distribution.png 28KB

latest_ship_model.pth 338KB

latest_ship_model.pth 325KB

submission.py 100KB

env_wrapper.py 28KB

controller.py 21KB

agent.py 19KB

env_wrapper.py 17KB

utils.py 14KB

agent.py 13KB

train.py 12KB

simulator_server.py 12KB

evaluate.py 11KB

env_wrapper.py 10KB

train.py 9KB

Coach.py 9KB

opensim_agent.py 9KB

connect4_game.py 8KB

utils.py 8KB

train.py 8KB

simulator_pb2.py 7KB

train.py 7KB

env_utils.py 7KB

train.py 7KB

actor.py 7KB

gridworld.py 7KB

train.py 7KB

gridworld.py 7KB

train.py 7KB

mlp_model.py 6KB

opensim_model.py 6KB

env.py 6KB

train.py 6KB

obs_filter.py 6KB

train.py 6KB

共 302 条

身份认证购VIP最低享 7 折!

30元优惠券

资源推荐

资源预览

资源评论

# The Winning Solution for the NeurIPS 2018: AI for Prosthetics Challenge <img src="image/competition.png" alt="PARL" width="800"/> This folder contains the winning solution of our team `Firework` in the NeurIPS 2018: AI for Prosthetics Challenge. It consists of three parts. The first part is our final submitted model, a sensible controller that can follow random target velocity. The second part is used for curriculum learning, to learn a natural and efficient gait at low-speed walking. The last part learns the final agent in the random velocity environment for round2 evaluation. For more technical details about our solution, we provide: 1. [[Link]](https://youtu.be/RT4JdMsZaTE) An interesting video demonstrating the training process visually. 2. [[Link]](https://docs.google.com/presentation/d/1n9nTfn3EAuw2Z7JichqMMHB1VzNKMgExLJHtS4VwMJg/edit?usp=sharing) A PowerPoint Presentation briefly introducing our solution in NeurIPS2018 competition workshop. 3. [[Link]](https://drive.google.com/file/d/1W-FmbJu4_8KmwMIzH0GwaFKZ0z1jg_u0/view?usp=sharing) A poster briefly introducing our solution in NeurIPS2018 competition workshop. 3. (coming soon)A full academic paper detailing our solution, including entire training pipline, related work and experiments that analyze the importance of each key ingredient. **Note**: Reproducibility is a long-standing issue in reinforcement learning field. We have tried to guarantee that our code is reproducible, testing each training sub-task three times. However, there are still some factors that prevent us from achieving the same performance. One problem is the choice time of a convergence model during curriculum learning. Choosing a sensible and natural gait visually is crucial for subsequent training, but the definition of what is a good gait varies from person to person. <img src="image/demo.gif" alt="PARL" width="500"/> ## Dependencies - python3.6 - [parl==1.0](https://github.com/PaddlePaddle/PARL) - [paddlepaddle==1.5.1](https://github.com/PaddlePaddle/Paddle) - [osim-rl](https://github.com/stanfordnmbl/osim-rl) - [grpcio==1.12.1](https://grpc.io/docs/quickstart/python.html) - tqdm - tensorflow (To use tensorboard) ## Part1: Final submitted model ### Result For final submission, we test our model in 500 CPUs, running 10 episodes per CPU with different random seeds. | Avg reward of all episodes | Avg reward of complete episodes | Falldown % | Evaluate episodes | |----------------------------|---------------------------------|------------|-------------------| | 9968.5404 | 9980.3952 | 0.0026 | 5000 | ### Test - How to Run 1. Enter the sub-folder `final_submit` 2. Download the model file from online storage service, [Baidu Pan](https://pan.baidu.com/s/1NN1auY2eDblGzUiqR8Bfqw) or [Google Drive](https://drive.google.com/open?id=1DQHrwtXzgFbl9dE7jGOe9ZbY0G9-qfq3) 3. Unpack the file by using: `tar zxvf saved_model.tar.gz` 4. Launch the test script: `python test.py` ## Part2: Curriculum learning <img src="image/curriculum-learning.png" alt="PARL" width="500"/> #### 1. Target: Run as fast as possible <img src="image/fastest.png" alt="PARL" width="800"/> ```bash # server python simulator_server.py --port [PORT] --ensemble_num 1 # client (Suggest: 200+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type RunFastest ``` #### 2. Target: run at 3.0 m/s ```bash # server python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [RunFastest model] # client (Suggest: 200+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 3.0 \ --act_penalty_lowerbound 1.5 ``` #### 3. target: walk at 2.0 m/s ```bash # server python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 3.0m/s model] # client (Suggest: 200+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 2.0 \ --act_penalty_lowerbound 0.75 ``` #### 4. target: walk slowly at 1.25 m/s <img src="image/last course.png" alt="PARL" width="800"/> ```bash # server python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 2.0m/s model] # client (Suggest: 200+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 1.25 \ --act_penalty_lowerbound 0.6 ``` ## Part3: Training in random velocity environment for round2 evaluation As mentioned before, the selection of model that used to fine-tune influence later training. For those who can not obtain expected performance by former steps, a pre-trained model that walk naturally at 1.25m/s is provided. ([Baidu Pan](https://pan.baidu.com/s/1PVDgIe3NuLB-4qI5iSxtKA) or [Google Drive](https://drive.google.com/open?id=1jWzs3wvq7_ierIwGZXc-M92bv1X5eqs7)) ```bash # server python simulator_server.py --port [PORT] --ensemble_num 12 --warm_start_batchs 1000 \ --restore_model_path [FixedTargetSpeed 1.25m/s model] --restore_from_one_head # client (Suggest: 100+ clients) python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type Round2 --act_penalty_lowerbound 0.75 \ --act_penalty_coeff 7.0 --vel_penalty_coeff 20.0 --discrete_data --stage 3 ``` ### Test trained model ```bash python test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM] ``` ### Other implementation details <img src="image/velocity_distribution.png" alt="PARL" width="800"/> Following the above steps correctly, you can get an agent that scores around 9960, socring slightly poorer than our final submitted model. The score gap results from the lack of multi-stage-training paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode, degrading the performance a single model due to the convetional conpept that it's hard to fit a model under different data distributions. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problems :) ## Acknowledgments We would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.

汀、人工智能

粉丝: 9w+
资源: 410

强化学习算法合集（DQN、DDPG、SAC、TD3、MADDPG、QMIX等等）

强化学习从基础到进阶-案例与实践含码源-强化学习全系列超详细算法码源齐全.zip

DeepReinforcementLearning：深度RL实施。 在pytorch中实现的DQN，SAC，DDPG，TD3，PPO和VPG。 经过测试的环境：LunarLander-v2和Pendulum-v0

rlkit:强化学习算法集合

PyRL:PyRL-Pytorch中的强化学习框架（政策梯度，DQN，DDPG，TD3，PPO，SAC等）

machin:专为PyTorch设计的强化学习库（框架），实现了DQN，DDPG，A2C，PPO，SAC，MADDPG，A3C，APEX，IMPALA ..

qmix_no_qmix_qmix实现_

人工智能多智能体强化学习VDN、QTRAN、QPLEX、QMIX算法python实现源码+相应模型.zip

ADRC自抗扰控制，PID控制，抗积分饱和PID控制，增量式PID控制，模糊FuzzyPID控制，LQR线性二次型调节器控制等

基于python实现多智能体强化学习VDN、QMIX、QTRAN、QPLEX算法源码+对应模型文件

基于gym的pytorch深度强化学习(PPO,DQN,SAC,DDPG,TD3等算法).zip

强化学习算法：此存储库包含大多数基于pytorch实现的经典深度强化学习算法，包括-DQN，DDQN，Dualling Network，DDPG，SAC，A2C，PPO，TRPO。 （更多算法仍在进行中）

基于pytorch深度强化学习的PPO,DQN,SAC,DDPG等算法实现python源码.zip

torchrl：强化学习算法的Pytorch实现（软演员评论员（SAC）DDPG TD3 DQN A2C PPO TRPO）

Local-QMIX:针对具有局部性假设的MARL设置的一种新的改进的DQN算法。 我们假设该问题具有一个基础的依赖关系图，该关系图连接了代理的相互作用

Multi_Agent_PPO

ROS系统中的移动机器人：基于强化学习算法的路径规划技术研究,ROS下的移动机器人路径规划算法，使用的是 强化学习算法 DQN DDPG SAC TD3等 ,ROS; 移动机器人; 路径规划算法; D

ROS下的移动机器人路径规划算法：基于强化学习算法DQN、DDPG、SAC及TD3的实践与应用,ROS系统中基于强化学习算法的移动机器人路径规划策略研究：应用DQN、DDPG、SAC及TD3算法,RO

Python_具有研究友好特征的深度强化学习算法PPO DQN C51 DDPG TD3 SAC PPG的高质量单文件.zip

基于gym的pytorch深度强化学习实现源码+项目说明(PPO,DQN,SAC,DDPG,TD3等算法).zip

基于gym的pytorch深度强化学习实现源码+项目说(PPO,DQN,SAC,DDPG,TD3算法.zip

带有火炬的深度增强学习：DQN，AC，ACER，A2C，A3C，PG，DDPG，TRPO，PPO，SAC，TD3和PyTorch实施...

ElegantRL:使用PyTorch的轻量级，高效且稳定的深度强化学习算法实现。 :fire:

基于gym的pytorch深度强化学习(DRL)(PPO,DQN,SAC,DDPG,TD3等算法).zip

具有研究友好功能的深度强化学习算法的高质量单文件实施（PPO、DQN、C51、DDPG、TD3、SAC、PPG）

具有研究友好功能的深度强化学习算法的高质量单文件实施（PPO、DQN、C51、DDPG、TD3、SAC、PPG）+源代码+文档说

"强化学习入门宝典：Pytorch实现九种DRL算法的详细教学与实战",强化学习之九种DRL算法Pytorch实践教程：从REINFORCE到PPO-discrete-RNN算法教学解析,强化学习教学

"深入探索Pytorch实现：九种DRL算法的强化学习教学与实践",强化学习教学 Pytorch 实现的9种 DRL 算法 包括以下9种：REINFORCE、Actor-Critic、Rainbow

强化学习算法RL代码大全（目前主流的强化学习算法的代码）

最新资源

DeepReinforcementLearning：深度RL实施。在pytorch中实现的DQN，SAC，DDPG，TD3，PPO和VPG。经过测试的环境：LunarLander-v2和Pendulum-v0

强化学习算法：此存储库包含大多数基于pytorch实现的经典深度强化学习算法，包括-DQN，DDQN，Dualling Network，DDPG，SAC，A2C，PPO，TRPO。（更多算法仍在进行中）

Local-QMIX:针对具有局部性假设的MARL设置的一种新的改进的DQN算法。我们假设该问题具有一个基础的依赖关系图，该关系图连接了代理的相互作用

ROS系统中的移动机器人：基于强化学习算法的路径规划技术研究,ROS下的移动机器人路径规划算法，使用的是强化学习算法 DQN DDPG SAC TD3等 ,ROS; 移动机器人; 路径规划算法; D

"深入探索Pytorch实现：九种DRL算法的强化学习教学与实践",强化学习教学 Pytorch 实现的9种 DRL 算法包括以下9种：REINFORCE、Actor-Critic、Rainbow