深度强化学习与通信优化算法在带宽受限多机器人探索中的应用资源-CSDN文库

版权申诉

深度强化学习

多机器人系统

152 浏览量 2025-01-14 20:50:53 上传评论收藏 1.93MB PDF 举报

资源推荐

资源详情

资源评论

Privileged Reinforcement and Communication

Learning for Distributed, Bandwidth-limited

Multi-robot Exploration

Yixiao Ma

1,2

, Jingsong Liang

1,2

, Yuhong Cao

, Derek Ming Siang Tan

, and

Guillaume Sartoretti

School of Computing, National University of Singapore, SG 117417.

yixiaoma@u.nus.edu, jingsongliang@u.nus.edu,

Mechanical Engineering Dept., National University of Singapore, SG 117575.

caoyuhong@nus.edu.sg, derektan@u.nus.edu, mpegas@nus.edu.sg,

WWW home page: http://www.marmotlab.org

Abstract. Communication bandwidth is an important consideration in

multi-robot exploration, where information exchange among robots is

critical. While existing methods typically aim to reduce communication

throughput, they either require signiﬁcant computation or signiﬁcantly

compromise exploration eﬃciency. In this work, we propose a deep re-

inforcement learning framework based on communication and privileged

reinforcement learning to achieve a signiﬁcant reduction in bandwidth

consumption, while minimally sacriﬁcing exploration eﬃciency. Speciﬁ-

cally, our approach allows robots to learn to embed the most salient in-

formation from their individual belief (partial map) over the environment

into ﬁxed-sized messages. Robots then reason about their own belief as

well as received messages to distributedly explore the environment while

avoiding redundant work. In doing so, we employ privileged learning and

learned attention mechanisms to endow the critic (i.e., teacher) network

with ground truth map knowledge to eﬀectively guide the policy (i.e.,

student) network during training. Compared to relevant baselines, our

model allows the team to reduce communication by up to two orders

of magnitude, while only sacriﬁcing a marginal 2.4% in total travel dis-

tance, paving the way for eﬃcient, distributed multi-robot exploration

in bandwidth-limited scenarios. We open-sourced our full code

Keywords: Deep Reinforcement Learning, Communication Learning,

Multi-robot exploration, Distributed Path Planning

1 Introduction

Information sharing in multi-robot exploration is critical to generating high-

quality, distributed exploration paths [25]: robots must communicate with each

other to obtain more information beyond their own partial knowledge, to make

cooperative decisions that can speed up task completion and avoid redundant

https://github.com/marmotlab/Bandwidth-Limited-Multi-Robot-Exploration

arXiv:2407.20203v1 [cs.RO] 29 Jul 2024

2 Yixiao Ma et al.

Fig. 1. Example application of our approach to a multi-robot exploration task in a

bandwidth-constrained environment (here, underwater).

work. Under normal circumstances, it is usually viable for most exploration

planners to share vast amounts of important information among team members

(such as full partial maps, or robot trajectories) continuously during the mis-

sion. However, these approaches become impractical in bandwidth-constrained

settings such as underwater and underground environments [11,22], where com-

munication range is often inversely linked to communication bandwidth. Recent

research shows that the transmission of an occupancy grid map usually requires

a bandwidth of ∼ 2 Mbps [23], but that underwater communication bandwidth

rarely exceeds 100kbps [11]. This problem of bandwidth constraints is further

exacerbated in larger teams, where (even modern) communication channels may

not be able to withstand high communication throughput between numerous

robots [14].

There are currently two main strategies to reduce communication throughput

among robots. The ﬁrst strategy is to simply decrease the frequency of communi-

cations [17]. However, this strategy often leads to poor cooperation, mainly due

to the lack of up-to-date information from other robots. The second strategy is

to reduce the size of messages, e.g., by only sharing low-dimensional represen-

tations of each others’ partial map, which recipients can then process to recon-

struct/estimate the full map [10,30]. However, these approaches usually come

at important computational costs and may result in lower exploration eﬃciency

when essential details from the original map are lost during map exchange.

To address these problems, we propose a novel DRL-based multi-robot ex-

ploration framework based on communication learning and privilege learning,

tailored for bandwidth-limited scenarios. Our framework relies on learned mes-

sages as an alternative to conventional map sharing. Our approach primarily

relies on communication learning to allow robots to learn to encode their own

belief map into a small, ﬁxed-sized message that is shared with other robots. In

doing so, our communication layer allows robots to learn to identify, encode, and

Privileged RL and CL for Bandwidth-limited Multi-Robot Exploration 3

share the most salient portion of their individual belief with each other, within

given constraints over message length (i.e., maximum bandwidth within the sys-

tem). Robots then learn to reason about their own knowledge/state as well as

receive messages to form an implicit representation of the overall explored en-

vironment. This enables the generation of high-quality, distributed exploration

paths. Following our recent work in single robot exploration [5], we rely on privi-

leged learning to boost the performance of our ﬁnal model. Speciﬁcally, we let our

critic network access ground truth information during training only, allowing it

to provide more accurate action evaluation for the training of the robots’ policy

network. This training approach signiﬁcantly enhances our ﬁnal model’s long-

term planning capabilities, by allowing robots to reason about their knowledge,

as well as received messages, at diﬀerent spatial and temporal scales.

We compare our model to a conventional multi-robot exploration planner in

a set of 100m×100m indoor maps and investigate the impact of using learned

messages over traditional partial map sharing. Our results show that our model

can reduce the volume of communications by up to 99.2%, at the cost of a

marginal 2.4% performance loss. These results highlight the capability of our

robots to understand and reason about the current global state of the exploration

task, without explicitly relying on other robots’ detailed map. We ﬁnally train a

variant of our DRL-based model, where robots are allowed to both share learned

messages and explicit partial maps. This model outperforms our map-sharing-

free approach and the conventional baseline by 11.4% and 9.2% respectively in

terms of exploration distance, highlighting the power of our general framework in

high-bandwidth scenarios where full maps may be reliably shared among robots.

2 Prior Works

2.1 Multi-robot Exploration

Approaches to conventional multi-robot exploration are methodologically classi-

ﬁed into two main categories: frontier-based and sampling-based. For example,

Yu et al. [29] applied artiﬁcial potential ﬁelds to attract robots towards di-

verse frontiers while ensuring mutual repulsion to maintain distance between

each other. Most recently, Cao et al. [1] proposed an mTSP (Multiple Traveling

Salesman Problem) based global planner in conjunction with a sampling-based

local planner to explore large-scale environments. The centralized global plan-

ner segments exploration areas into several vital nodes and then assigns them

to robots. However, these approaches remain greedy and prioritize short-term

eﬃciency, often resulting in shortsighted path planning and suboptimal perfor-

mances.

Given the rapid advancement of neural networks, many works have looked

to deep learning to enhance autonomous exploration. Niroui et al. [19] ﬁrst

pioneered the integration of frontier-based methods with deep reinforcement

learning to resolve short-sightedness problems in single-robot exploration. Yu

et al. [28] employed asynchronous multi-robot proximal policy optimization as

a training method to address multi-robot exploration. Concurrently, Luo et

剩余13页未读，继续阅读

评论收藏

内容反馈

版权申诉

pk_xz123456

粉丝: 2968
资源: 4150

深度强化学习与通信优化算法在带宽受限多机器人探索中的应用

受限带宽下医疗机器人视频自适应传输的实现.pdf

基于蚁群算法的多机器人集中协调式路径规划

修正粒子群优化算法在能量定位中的运用.pdf

采摘机器人定位导航系统设计——基于无线传感网络和机器视觉.pdf

多AUV目标搜素与围捕.zip

自动跟随机器人.pdf

分析机器人避障技术：从传感器到算法原理.docx

电信设备-基于二值化环境信息的欠驱动机器人自组织聚集方法.zip

边缘计算驱动的对话机器人终端部署.pdf

离散输入受限系统的增益调度事件触发和自触发控制.pdf

水声通信技术

基于GNP算法的分布式爬虫调度策略.pdf

通信与网络中的基于i.mx27的机器人视频监控系统

Astar 寻路算法

精品年沈自所科创计划项目简介.docx

为广泛的视觉导向机器学习应用铺路——专访Xilinx公司战略与市场营销部高级副总裁Steve Glaser.pdf

基于树莓派的智能家居设计与实现.pdf

具有通信延迟和有限数据速率的多主体系统的分布式共识

在带宽和时变拓扑有限的数字网络上达成分布式共识

通过具有量化信息的周期性事件触发算法对多主体系统进行共识分析

NXP i.MX RT1052驱动MPU6050—加速度陀螺仪

基于i.mx27机器人视频监控系统

AI大模型的分类.docx

具有定向拓扑的一般线性多智能体系统的事件触发共识

通过事件触发控制和组合测量，对一般线性多智能体系统的领导者遵循指数共识

通讯访问约束条件下线性随机系统的状态可估计性

传感技术中的在无线传感器网络中路由的选择方式

无线传感器网络的WiME系统路由设计

嵌入式系统和物联网（IoT）开发

最新资源