没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
内容概要:本文提出了一个新的深度强化学习(DRL)框架,旨在解决带宽限制环境下多机器人协同探测的问题。该框架利用通信学习和特权学习(privileged learning)技术,在显著减少通信量的前提下保持较高的探索效率。具体来说,机器人学会了从它们各自的局部环境图(partial map)中提取最相关信息并嵌入到固定大小的消息中,与其他机器人共享。机器人再根据自己及收到的消息形成对整个被探索环境隐含的全局表示。为了训练这一模型,作者引入了批评网络(critic network),该网络在训练期间可以访问真实地图知识来指导政策网络(policy network),使训练后的机器人能够在低带宽情况下做出高效决策。相比于基线方法,该模型将通信量减少了两个数量级,并且只损失了2.4%的整体旅行距离。实验表明这种框架不仅适用于低带宽情况,也能提升高带宽场景下的表现。 适合人群:熟悉机器人学和深度学习概念的研究人员和技术专家。 使用场景及目标:该研究主要面向水下或地下洞穴等环境中执行任务时需要面对严重通信瓶颈的机器人系统;通过减少通讯需求提高团队协作效率。 其他说明:代码已经开源,并进行了大量测试环境的数据收集以验证其性能。文中提供了详细的算法架构介绍以及实验设置和结果对比图表。
资源推荐
资源详情
资源评论
Privileged Reinforcement and Communication
Learning for Distributed, Bandwidth-limited
Multi-robot Exploration
Yixiao Ma
1,2
, Jingsong Liang
1,2
, Yuhong Cao
2
, Derek Ming Siang Tan
2
, and
Guillaume Sartoretti
2
1
School of Computing, National University of Singapore, SG 117417.
yixiaoma@u.nus.edu, jingsongliang@u.nus.edu,
2
Mechanical Engineering Dept., National University of Singapore, SG 117575.
caoyuhong@nus.edu.sg, derektan@u.nus.edu, mpegas@nus.edu.sg,
WWW home page: http://www.marmotlab.org
Abstract. Communication bandwidth is an important consideration in
multi-robot exploration, where information exchange among robots is
critical. While existing methods typically aim to reduce communication
throughput, they either require significant computation or significantly
compromise exploration efficiency. In this work, we propose a deep re-
inforcement learning framework based on communication and privileged
reinforcement learning to achieve a significant reduction in bandwidth
consumption, while minimally sacrificing exploration efficiency. Specifi-
cally, our approach allows robots to learn to embed the most salient in-
formation from their individual belief (partial map) over the environment
into fixed-sized messages. Robots then reason about their own belief as
well as received messages to distributedly explore the environment while
avoiding redundant work. In doing so, we employ privileged learning and
learned attention mechanisms to endow the critic (i.e., teacher) network
with ground truth map knowledge to effectively guide the policy (i.e.,
student) network during training. Compared to relevant baselines, our
model allows the team to reduce communication by up to two orders
of magnitude, while only sacrificing a marginal 2.4% in total travel dis-
tance, paving the way for efficient, distributed multi-robot exploration
in bandwidth-limited scenarios. We open-sourced our full code
3
..
Keywords: Deep Reinforcement Learning, Communication Learning,
Multi-robot exploration, Distributed Path Planning
1 Introduction
Information sharing in multi-robot exploration is critical to generating high-
quality, distributed exploration paths [25]: robots must communicate with each
other to obtain more information beyond their own partial knowledge, to make
cooperative decisions that can speed up task completion and avoid redundant
3
https://github.com/marmotlab/Bandwidth-Limited-Multi-Robot-Exploration
arXiv:2407.20203v1 [cs.RO] 29 Jul 2024
2 Yixiao Ma et al.
Fig. 1. Example application of our approach to a multi-robot exploration task in a
bandwidth-constrained environment (here, underwater).
work. Under normal circumstances, it is usually viable for most exploration
planners to share vast amounts of important information among team members
(such as full partial maps, or robot trajectories) continuously during the mis-
sion. However, these approaches become impractical in bandwidth-constrained
settings such as underwater and underground environments [11,22], where com-
munication range is often inversely linked to communication bandwidth. Recent
research shows that the transmission of an occupancy grid map usually requires
a bandwidth of ∼ 2 Mbps [23], but that underwater communication bandwidth
rarely exceeds 100kbps [11]. This problem of bandwidth constraints is further
exacerbated in larger teams, where (even modern) communication channels may
not be able to withstand high communication throughput between numerous
robots [14].
There are currently two main strategies to reduce communication throughput
among robots. The first strategy is to simply decrease the frequency of communi-
cations [17]. However, this strategy often leads to poor cooperation, mainly due
to the lack of up-to-date information from other robots. The second strategy is
to reduce the size of messages, e.g., by only sharing low-dimensional represen-
tations of each others’ partial map, which recipients can then process to recon-
struct/estimate the full map [10,30]. However, these approaches usually come
at important computational costs and may result in lower exploration efficiency
when essential details from the original map are lost during map exchange.
To address these problems, we propose a novel DRL-based multi-robot ex-
ploration framework based on communication learning and privilege learning,
tailored for bandwidth-limited scenarios. Our framework relies on learned mes-
sages as an alternative to conventional map sharing. Our approach primarily
relies on communication learning to allow robots to learn to encode their own
belief map into a small, fixed-sized message that is shared with other robots. In
doing so, our communication layer allows robots to learn to identify, encode, and
Privileged RL and CL for Bandwidth-limited Multi-Robot Exploration 3
share the most salient portion of their individual belief with each other, within
given constraints over message length (i.e., maximum bandwidth within the sys-
tem). Robots then learn to reason about their own knowledge/state as well as
receive messages to form an implicit representation of the overall explored en-
vironment. This enables the generation of high-quality, distributed exploration
paths. Following our recent work in single robot exploration [5], we rely on privi-
leged learning to boost the performance of our final model. Specifically, we let our
critic network access ground truth information during training only, allowing it
to provide more accurate action evaluation for the training of the robots’ policy
network. This training approach significantly enhances our final model’s long-
term planning capabilities, by allowing robots to reason about their knowledge,
as well as received messages, at different spatial and temporal scales.
We compare our model to a conventional multi-robot exploration planner in
a set of 100m×100m indoor maps and investigate the impact of using learned
messages over traditional partial map sharing. Our results show that our model
can reduce the volume of communications by up to 99.2%, at the cost of a
marginal 2.4% performance loss. These results highlight the capability of our
robots to understand and reason about the current global state of the exploration
task, without explicitly relying on other robots’ detailed map. We finally train a
variant of our DRL-based model, where robots are allowed to both share learned
messages and explicit partial maps. This model outperforms our map-sharing-
free approach and the conventional baseline by 11.4% and 9.2% respectively in
terms of exploration distance, highlighting the power of our general framework in
high-bandwidth scenarios where full maps may be reliably shared among robots.
2 Prior Works
2.1 Multi-robot Exploration
Approaches to conventional multi-robot exploration are methodologically classi-
fied into two main categories: frontier-based and sampling-based. For example,
Yu et al. [29] applied artificial potential fields to attract robots towards di-
verse frontiers while ensuring mutual repulsion to maintain distance between
each other. Most recently, Cao et al. [1] proposed an mTSP (Multiple Traveling
Salesman Problem) based global planner in conjunction with a sampling-based
local planner to explore large-scale environments. The centralized global plan-
ner segments exploration areas into several vital nodes and then assigns them
to robots. However, these approaches remain greedy and prioritize short-term
efficiency, often resulting in shortsighted path planning and suboptimal perfor-
mances.
Given the rapid advancement of neural networks, many works have looked
to deep learning to enhance autonomous exploration. Niroui et al. [19] first
pioneered the integration of frontier-based methods with deep reinforcement
learning to resolve short-sightedness problems in single-robot exploration. Yu
et al. [28] employed asynchronous multi-robot proximal policy optimization as
a training method to address multi-robot exploration. Concurrently, Luo et
剩余13页未读,继续阅读
资源评论
pk_xz123456
- 粉丝: 2968
- 资源: 4150
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- HFSS平面倒F(PIFA)天线 GSM900单频PIFA天线、GSM900和DCS1800双频PIFA天线 天线模型,附带结果,可改参数
- HFSS双频单极子极子天线 天线模型,附带结果,可改参数
- HFSS印刷偶极子天线 天线模型,附带结果,可改参数
- HFSS半波偶极子天线 天线模型,附带结果,可改参数
- 二维水力图出图,三维建模流体机械仿真,泵 水轮机 液力透平
- pscad三相输电线路合空线切空线过电压合闸电阻法抑制过电压 定制直流输电差动保护 用pscad搭个220kv三相空载输电线路,仿真合空线,切空线过电压,仿真避雷器,合闸电阻法抑制合闸过
- 基于FPGA的CIC滤波器抽取内插滤波器数字上下变频多采样率信号处理 级联积分梳 状(CIC) 滤波器是一类硬件效率高的线性相位有限脉冲响应 (FIR) 数字滤波器 CIC 滤波器无需使用乘法器即可
- STM32 IAP固件升级程序源代码 STM32通过串口,接 收上位机、APP、或者服务器来的数据,更新设备的固件,也就是说上位机端可以通过wifi转串口,网口转串口,GPRS转串口模块等,给这个S
- 电动汽车模型的各模块的Simulink模型,包括驾驶员模块,整车控制器模块,电机模块,变速器模块,主减速器模块,车轮模块,车速模块以及BMS模块 附有说明文档,文档详细的描述了模型的建模过程及功能
- 西门子S1200 PID 恒温恒压供冷却水程序.霍尼韦尔电动比例 阀PID控制水温,与两台西门子v20变频器PID控制水压. 包括程序和Eplan源档图纸.图纸和程序都是自用模板
- Comsol熊猫光纤应力传感分析 固体力学和光学模块多物理场耦和
- MATLAB 代码:多能互补热电联供型微网优化,完美复现,注释很详细
- Simulink 内置永磁同步电机滑模控制器,滑模观测器研究 基于永磁同步电机 以及拓展应用
- 基于S7-200 PLC全自动工业洗衣机控制系统 带解释的梯形图程序,接线图原理图图纸,io分配,组态画面
- S7-200 MCGS 基于PLC模拟城轨自动票机控制 带解释的梯形图程序,接线图原理图图纸,io分配,组态画面
- 三菱PLC程序MCGS触摸屏组态材料自动分拣控制系统 带解释的梯形图程序,接线图原理图图纸,io分配,组态画面
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功