# The Winning Solution for the NeurIPS 2018: AI for Prosthetics Challenge
<p align="center">
<img src="image/competition.png" alt="PARL" width="800"/>
</p>
This folder contains the winning solution of our team `Firework` in the NeurIPS 2018: AI for Prosthetics Challenge. It consists of three parts. The first part is our final submitted model, a sensible controller that can follow random target velocity. The second part is used for curriculum learning, to learn a natural and efficient gait at low-speed walking. The last part learns the final agent in the random velocity environment for round2 evaluation.
For more technical details about our solution, we provide:
1. [[Link]](https://youtu.be/RT4JdMsZaTE) An interesting video demonstrating the training process visually.
2. [[Link]](https://docs.google.com/presentation/d/1n9nTfn3EAuw2Z7JichqMMHB1VzNKMgExLJHtS4VwMJg/edit?usp=sharing) A PowerPoint Presentation briefly introducing our solution in NeurIPS2018 competition workshop.
3. [[Link]](https://drive.google.com/file/d/1W-FmbJu4_8KmwMIzH0GwaFKZ0z1jg_u0/view?usp=sharing) A poster briefly introducing our solution in NeurIPS2018 competition workshop.
3. (coming soon)A full academic paper detailing our solution, including entire training pipline, related work and experiments that analyze the importance of each key ingredient.
**Note**: Reproducibility is a long-standing issue in reinforcement learning field. We have tried to guarantee that our code is reproducible, testing each training sub-task three times. However, there are still some factors that prevent us from achieving the same performance. One problem is the choice time of a convergence model during curriculum learning. Choosing a sensible and natural gait visually is crucial for subsequent training, but the definition of what is a good gait varies from person to person.
<p align="center">
<img src="image/demo.gif" alt="PARL" width="500"/>
</p>
## Dependencies
- python3.6
- [parl==1.0](https://github.com/PaddlePaddle/PARL)
- [paddlepaddle==1.5.1](https://github.com/PaddlePaddle/Paddle)
- [osim-rl](https://github.com/stanfordnmbl/osim-rl)
- [grpcio==1.12.1](https://grpc.io/docs/quickstart/python.html)
- tqdm
- tensorflow (To use tensorboard)
## Part1: Final submitted model
### Result
For final submission, we test our model in 500 CPUs, running 10 episodes per CPU with different random seeds.
| Avg reward of all episodes | Avg reward of complete episodes | Falldown % | Evaluate episodes |
|----------------------------|---------------------------------|------------|-------------------|
| 9968.5404 | 9980.3952 | 0.0026 | 5000 |
### Test
- How to Run
1. Enter the sub-folder `final_submit`
2. Download the model file from online storage service, [Baidu Pan](https://pan.baidu.com/s/1NN1auY2eDblGzUiqR8Bfqw) or [Google Drive](https://drive.google.com/open?id=1DQHrwtXzgFbl9dE7jGOe9ZbY0G9-qfq3)
3. Unpack the file by using:
`tar zxvf saved_model.tar.gz`
4. Launch the test script:
`python test.py`
## Part2: Curriculum learning
<p align="center">
<img src="image/curriculum-learning.png" alt="PARL" width="500"/>
</p>
#### 1. Target: Run as fast as possible
<p align="center">
<img src="image/fastest.png" alt="PARL" width="800"/>
</p>
```bash
# server
python simulator_server.py --port [PORT] --ensemble_num 1
# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type RunFastest
```
#### 2. Target: run at 3.0 m/s
```bash
# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
--restore_model_path [RunFastest model]
# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 3.0 \
--act_penalty_lowerbound 1.5
```
#### 3. target: walk at 2.0 m/s
```bash
# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
--restore_model_path [FixedTargetSpeed 3.0m/s model]
# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 2.0 \
--act_penalty_lowerbound 0.75
```
#### 4. target: walk slowly at 1.25 m/s
<p align="center">
<img src="image/last course.png" alt="PARL" width="800"/>
</p>
```bash
# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
--restore_model_path [FixedTargetSpeed 2.0m/s model]
# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 1.25 \
--act_penalty_lowerbound 0.6
```
## Part3: Training in random velocity environment for round2 evaluation
As mentioned before, the selection of model that used to fine-tune influence later training. For those who can not obtain expected performance by former steps, a pre-trained model that walk naturally at 1.25m/s is provided. ([Baidu Pan](https://pan.baidu.com/s/1PVDgIe3NuLB-4qI5iSxtKA) or [Google Drive](https://drive.google.com/open?id=1jWzs3wvq7_ierIwGZXc-M92bv1X5eqs7))
```bash
# server
python simulator_server.py --port [PORT] --ensemble_num 12 --warm_start_batchs 1000 \
--restore_model_path [FixedTargetSpeed 1.25m/s model] --restore_from_one_head
# client (Suggest: 100+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type Round2 --act_penalty_lowerbound 0.75 \
--act_penalty_coeff 7.0 --vel_penalty_coeff 20.0 --discrete_data --stage 3
```
### Test trained model
```bash
python test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM]
```
### Other implementation details
<p align="center">
<img src="image/velocity_distribution.png" alt="PARL" width="800"/>
</p>
Following the above steps correctly, you can get an agent that scores around 9960, socring slightly poorer than our final submitted model. The score gap results from the lack of multi-stage-training paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode, degrading the performance a single model due to the convetional conpept that it's hard to fit a model under different data distributions. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problems :)
## Acknowledgments
We would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.
没有合适的资源?快使用搜索试试~ 我知道了~
强化学习算法合集(DQN、DDPG、SAC、TD3、MADDPG、QMIX等等)内涵20+强化学习经典算法代码。对应使用教程什么的参考博客:
多智能体(前沿算法+原理)
https://blog.csdn.net/sinat_39620217/article/details/115299073?spm=1001.2014.3001.5502
强化学习基础篇(单智能体算法)
https://blog.csdn.net/sinat_39620217/category_10940146.html
收起资源包目录





































































































共 302 条
- 1
- 2
- 3
- 4
资源推荐
资源预览
资源评论

143 浏览量

115 浏览量
2021-05-03 上传
2021-03-29 上传

147 浏览量
2024-06-27 上传
194 浏览量
110 浏览量

2021-02-01 上传
2021-03-30 上传
183 浏览量
ROS系统中的移动机器人:基于强化学习算法的路径规划技术研究,ROS下的移动机器人路径规划算法,使用的是 强化学习算法 DQN DDPG SAC TD3等 ,ROS; 移动机器人; 路径规划算法; D
2025-02-06 上传
ROS下的移动机器人路径规划算法:基于强化学习算法DQN、DDPG、SAC及TD3的实践与应用,ROS系统中基于强化学习算法的移动机器人路径规划策略研究:应用DQN、DDPG、SAC及TD3算法,RO
133 浏览量
2024-05-24 上传
147 浏览量


175 浏览量
2021-04-11 上传

171 浏览量
2024-02-26 上传
155 浏览量
"强化学习入门宝典:Pytorch实现九种DRL算法的详细教学与实战",强化学习之九种DRL算法Pytorch实践教程:从REINFORCE到PPO-discrete-RNN算法教学解析,强化学习教学
2025-02-11 上传
198 浏览量
资源评论


汀、人工智能
- 粉丝: 9w+
- 资源: 410
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- Comsol多孔介质流动、渗流、毛细、传热、传质及地表地下水耦合研究.pdf
- COMSOL多孔介质流动模拟.pdf
- 148-1005n维空间样本密度与样本数-1080P 高清-AVC.mp4
- COMSOL多孔介质流固耦合案例:孔压、位移时空演化特征.pdf
- COMSOL多孔介质流固耦合案例:孔压与位移时空演化特征.pdf
- COMSOL多孔介质流燃烧器模型:集四场耦合、多物理场与非等温流动反应流场于一体的仿真算法.pdf
- Comsol多孔介质内粒子流动案例:追踪粒子运动轨迹.pdf
- COMSOL多孔介质流燃烧器模型:集四场耦合与多物理场于一体的非等温流动反应模型.pdf
- COMSOL多孔介质流燃烧器模型:集四场耦合与多物理场非等温流动反应模拟于一体.pdf
- 149-1006降维方法分类-1080P 高清-AVC.mp4
- COMSOL多孔介质与多相材料渗流模拟:球体及过渡区边界与土体夹杂碎石渗流模拟.pdf
- COMSOL多孔介质相对渗透率曲线绘制案例——新手入门指南.pdf
- COMSOL多孔介质中空气水分输送与湿热耦合研究.pdf
- COMSOL多孔模型生成:内置API一键生成,附使用教程视频.pdf
- COMSOL多孔疏锂模型促进锂的均匀沉积.pdf
- Comsol多孔吸声仿真计算:jca模型、孔隙率、流阻率、曲折度及热、粘性特征长度.pdf
安全验证
文档复制为VIP权益,开通VIP直接复制
