Multi-cameravisualSLAMforautonomousnavigationofmicroaerialvehicles.pdf资源-CSDN文库

论文

需积分: 25 28 浏览量 2020-03-31 16:47:01 上传评论收藏 12.97MB PDF 举报

资源推荐

资源详情

资源评论

Accepted Manuscript

Multi-camera visual SLAM for autonomous navigation of micro aerial

vehicles

Shaowu Yang, Sebastian A. Scherer, Xiaodong Yi, Andreas Zell

PII: S0921-8890(15)30217-7

DOI: http://dx.doi.org/10.1016/j.robot.2017.03.018

Reference: ROBOT 2817

To appear in: Robotics and Autonomous Systems

Please cite this article as: S. Yang, et al., Multi-camera visual SLAM for autonomous navigation of

micro aerial vehicles, Robotics and Autonomous Systems (2017),

http://dx.doi.org/10.1016/j.robot.2017.03.018

This is a PDF ﬁle of an unedited manuscript that has been accepted for publication. As a service to

our customers we are providing this early version of the manuscript. The manuscript will undergo

copyediting, typesetting, and review of the resulting proof before it is published in its ﬁnal form.

Please note that during the production process errors may be discovered which could affect the

content, and all legal disclaimers that apply to the journal pertain.

Multi-Camera Visual SLAM for Autonomous Navigation of

Micro Aerial Vehicles

Shaowu Yang

1,2 ∗†

, Sebastian A. Scherer

, Xiaodong Yi

1,2

and Andreas Zell

State Key Laboratery of High Performance Computing (HPCL), National University of Defense Technology.

School of Computer, National University of Defense Technology, Changsha, China.

Department of Computer Science, University of T

ubingen, T

ubingen, Germany

Abstract

In this paper, we present a visual simultaneous localization and mapping (SLAM) system which in-

tegrates measurements from multiple cameras to achieve robust pose tracking for autonomous navi-

gation of micro aerial vehicles (MAVs) in unknown complex environments. We analyze the iterative

optimizations for pose tracking and map reﬁnement of visual SLAM in multi-camera cases. The

analysis ensures the soundness and accuracy of each optimization update. A well-known monocular

visual SLAM system is extended to utilize two cameras with non-overlapping ﬁelds of view (FOVs)

in the ﬁnal implementation. The resulting visual SLAM system enables autonomous navigation

of an MAV in complex scenarios. The theory behind this system can easily be extended to multi-

camera conﬁgurations, when the onboard computational capability allows this. For operations in

large-scale environments, we modify the resulting visual SLAM system to be a constant-time robust

visual odometry. To form a full visual SLAM system, we further implement an eﬃcient back-end

for loop closing. The back-end maintains a keyframe-based global map, which is also used for

loop-closure detection. An adaptive-window pose-graph optimization method is proposed to reﬁne

keyframe poses of the global map and thus correct pose drift that is inherent in the visual odome-

try. We demonstrate the eﬃciency of the proposed visual SLAM algorithm for applications onboard

of MAVs in experiments with both autonomous and manual ﬂights. The pose tracking results are

compared with ground truth data provided by an external tracking system.

1 Introduction

In the last decade, we have seen a growing interest in micro aerial vehicles (MAVs) from the robotics community.

One of the reasons for this trend is that MAVs are potentially able to eﬃciently navigate in complex 3D environ-

ments with diﬀerent types of terrains, which might be inaccessible to ground vehicles or large-scale unmanned aerial

vehicles (UAVs), e.g. in an earthquake-damaged building (Michael et al., 2012). A basic requirement for MAVs to

autonomously operate in such environments is their robust pose tracking abilities, which is still a challenging task

when the environment is previously unknown and external signals for providing global position data are unreliable.

Meanwhile, if a map of the environment can be built, it will be able to provide support to path planning of autonomous

navigation of the MAV (Schauwecker and Zell, 2014). Recently, more focus has been on using onboard visual solu-

tions to address these issues, especially using visual simultaneous localization and mapping (SLAM) systems.

Although we have seen successful applications of visual SLAM on ground vehicles (Cummins and Newman, 2008;

Strasdat et al., 2011), there are more challenges in using visual SLAM to enable autonomous navigation of MAVs.

∗

corraddr:shaowu.yang@nudt.edu.cn

†

This work was mainly carried out at the Department of Computer Science, University of T

ubingen. Project 61403409 supported by NSFC.

Figure 1: Our MAV platform, with two cameras facing two diﬀerent directions: downward (in green ellipse) and

forward (in red ellipse).

First, the payload of an MAV is rather limited. This usually leads to limited onboard computational capability, which

requires the visual SLAM system to be very eﬃcient, especially in large-scale operations. Second, due to the 3D

maneuverability of an MAV, robustness of pose tracking is highly desired. When pose tracking fails, unlike a ground

vehicle which may be able to maintain stability simply by ceasing moving, an MAV is likely to run into catastrophic

situations with its position control getting lost.

In order to achieve more robust pose tracking of visual SLAM for MAVs, previous work has made much eﬀort in

fusing data from multi-modal sensors, e.g. fusing inertial measurements (Shen et al., 2013b; Weiss et al., 2013; Li

and Mourikis, 2013; Hesch et al., 2014). Recently, the robotics community has shown a growing interest in improving

the performance of visual SLAM by utilizing multiple cameras (Lee et al., 2013a; Yang et al., 2014b; Heng et al.,

2014b). Pose tracking of a monocular visual SLAM system may easily fail in complex environments when very

limited number of visual features can be observed, due to its limited ﬁeld of view. This also applies to typical stereo

visual SLAM, which employs cameras looking in one speciﬁc direction. The FOV can be enlarged by using a lens

with a wider viewing angle (even ﬁsh-eye lens), but at the cost of suﬀering from larger lens distortion and loss of

environmental details, due to a smaller angular camera resolution. This also applies to catadioptric omnidirectional

vision systems (Lu et al., 2011). Another type of omnidirectional vision systems combines multiple cameras into one

vision system, maintaining a single-viewpoint projection model. However, these cameras need to be very precisely

conﬁgured within relatively heavy mechanical systems in order to preserve this model, and thus are not ﬂexible enough

for MAV applications. Obviously, larger eﬀective FOV of a vision system can be obtained by integrating multiple

cameras in it. This implies that better pose tracking robustness could be achieved by extending the monocular visual

SLAM to utilize measurements from multiple cameras. The challenge is how to utilize those cameras in SLAM

eﬃciently and ﬂexibly.

In this paper, we investigate multi-camera visual SLAM for robust pose tracking and environmental mapping. In the

proposed visual SLAM system, multiple cameras can be mounted pointing to diﬀerent directions, so that more reliable

visual features can be observed. Our method allows a SLAM system to integrate images captured from various useful

perspectives without requiring the cameras to be mounted in a speciﬁc way in order to keep a single-viewpoint model.

This makes the conﬁguration of cameras very ﬂexible. On the other hand, since multiple cameras no longer preserve

this model, using features from multiple cameras in SLAM is not a trivial issue: The question of how features from

additional cameras can be used in the iterative optimization of a SLAM system needs to be carefully analyzed. Based

on this analysis, we are able to integrate those image features into a single visual SLAM system. This enables our MAV

to achieve more robust pose tracking and to build a map that consists of more interesting regions of the environment.

In the ﬁnal implementation, we expand the FOV of the vision system by using two cameras mounted looking in two

diﬀerent directions (forward and downward) to capture more critical views, as shown in Fig. 1. The choice of the count

of cameras is a compromise between tracking robustness and onboard computational capability. We further modify

the above multi-camera visual SLAM system to operate as a robust visual odometry with constant-time cost towards

large-scale explorations. Moreover, we propose an eﬃcient back-end for loop-closure detection and correcting pose

drift that is inherent in the visual odometry by using pose-graph optimization (PGO).

The multi-camera visual SLAM front-end and the back-end were ﬁrst proposed in our previous conference presen-

tations in Yang et al. (2014b) and Yang et al. (2016), respectively. In this paper, we complete the theory of our

multi-camera visual SLAM system with more detailed and systematical analysis, and provide further evaluations of

the system in the experimental results.

The remainder of this paper is organized as follows. Related work on visual SLAM for MAVs and multi-camera

visual SLAM is reviewed in Sec. 2. In Sec. 3, we present the analysis on optimizations in multi-camera SLAM, and

the implementation of the visual odometry. Then we present our SLAM back-end for managing the global map and

loop closing in Sec. 4. The performance of the proposed SLAM system is evaluated in the experiments in Sec. 5.

Finally, in the last section, we provide the summary and discussion of this work.

2 Related Work

In recent years, real-time visual SLAM has been achieved. Two methodologies have become predominant in visual

SLAM (Scaramuzza and Fraundorfer, 2011): ﬁltering methods which fuse information from all past measurements in a

probability distribution (Davison et al., 2007; Eade and Drummond, 2007), and keyframe-based methods, like parallel

tracking and mapping (PTAM) (Klein and Murray, 2007) and Linear MonoSLAM (Zhao et al., 2014). PTAM orga-

nizes its map in keyframes and use nonlinear optimizations for pose tracking and map reﬁnement. Linear MonoSLAM

adopts an eﬃcient sub-map scheme and utilizes bundle adjustment (BA) only in building initial sub-maps, resulting

a linear approach which can achieve a performance very close to that obtained by using global BA. The advantages

and disadvantages of ﬁltering and keyframe-based methods are analyzed in the work of Strasdat et al. (2010a, 2012),

suggesting that, in most modern applications, keyframe optimization gives the most accuracy per unit of computing

time. The progress in visual SLAM has facilitated its applications to aerial robots.

Autonomous navigation of UAVs/MAVs relying on pose estimates from GPS sensors has been well studied in early

researches. Related work usually fuses inertial navigation system (INS) data to aid GPS-sensor data, achieving au-

tonomous navigation in high altitude and long range operations. However, they are not suitable in GPS-denied envi-

ronments, like indoor or in outdoor urban area. Recently, much eﬀort has been focused on developing visual SLAM

systems to enable autonomous ﬂight of MAVs. Related work using stereo cameras, monocular cameras and RGB-D

cameras can be found in the literature.

Autonomous mapping and exploration for MAVs based on stereo cameras is presented in Fraundorfer et al. (2012).

The work in Schauwecker and Zell (2013) features a vision system for autonomous navigation of MAVs using two

pairs of stereo cameras, with stereo triangulation adding constraints to bundle adjustment in PTAM. A stereo setup

yields metric scale information of the environment. However, those systems have diﬃculties in using distant features

since they triangulate those feature points based on their short baselines. Stereo visual odometry and SLAM systems

may degenerate to the monocular case when the distance to the scene is much larger than the stereo baseline. In this

case, stereo vision becomes ineﬀective and monocular methods must be used (Scaramuzza and Fraundorfer, 2011).

In Achtelik et al. (2011), PTAM is used to provide position estimates for an MAV, while fusing data from an air

pressure sensor and accelerometers to estimate the unknown metric scale factor of the monocular vision system.

The work in Weiss and Siegwart (2011) presents a visual-inertial data fusion method based on the EKF. It is further

implemented in Weiss et al. (2012) for autonomous navigation of MAVs using inertial data and visual pose estimates

from a modiﬁed PTAM system. The scale drift of the monocular PTAM system has been considered in the EKF

framework.

A vision-based system combining advantages of both monocular vision and stereo vision is developed in Shen et al.

(2013b), which uses a low frame-rate secondary camera to extend a high frame-rate forward facing camera that is

equipped with a ﬁsheye lens. It can provide robust state estimates for a quadrotor by fusing the onboard inertial data.

The resulting vision system mainly relies on monocular vision algorithms, while being able to track metric scale by

stereo triangulation. Since the two cameras are conﬁgured in a stereo setup, the ﬁeld of view of the vision system is

not expanded. The improved work in Shen et al. (2013a) enables a quadrotor to autonomously travel at speed up to 4

m/s, and allows roll and pitch angles exceeding 20 degrees in 3D indoor environments.

In Huang et al. (2011), autonomous ﬂight of an MAV is enabled by the proposed SLAM system using an RGB-

D camera. A visual odometry is used for real-time local state estimation, and integrated with the RGBD-Mapping

(described in Henry et al. (2014)) to form the SLAM system. An eﬃcient RGB-D SLAM system is described in

Scherer and Zell (2013), which enables an MAV to autonomously ﬂy in an unknown environment and create a map of

its surroundings. In this work, sparse optical ﬂow is used for feature matching, which is of advantage when motion

blur may result in very limited number of local features being detected. The work in Valenti et al. (2014) perform

RGB-D visual odometry on an MAV to enable its autonomous ﬂight. 3D occupancy grid map is then built for path

planning of the MAV.

Previous work on developing multi-camera systems can mainly be found in applications of surveillance and object

tracking (Collins et al., 2000; Krumm et al., 2000). More relevant related work appears in the context of structure from

motion (SFM). The work in Pless (2003) presents a theoretical treatment of multi-camera systems in SFM deriving

the generalized epipolar constraint. The work in Frahm et al. (2004) proposes a virtual camera as a representation of a

multi-camera system for pose estimation. A structure-from-motion scheme is achieved using multiple cameras in this

work.

A number of multi-camera systems for pose estimation of mobile robots can be found in the literature. In Ragab

(2008), pose estimation of a mobile robot is solved by placing two back-to-back stereo pairs on the robot using the

Extended Kalman Filter (EKF). The work in Lee et al. (2013a) adopts a generalized camera model described in

Pless (2003) for a multi-camera system, to estimate the ego-motion of a self-driving car using a 2-Point RANSAC

algorithm. This system allows point correspondences among diﬀerent cameras. In Lee et al. (2013b), pose-graph

loop-closure constraints are computed. The relative pose between two loop-closing pose-graph vertices is obtained

from the epipolar geometry of the multi-camera system. Kaess and Dellaert (2006) presented a visual SLAM system

with a multi-camera rig using Harris corner detector (Harris and Stephens, 1988). In their further work in Kaess and

Dellaert (2010), a Bayesian approach to data association is presented taking into account moving features which can be

observed by cameras under robot motion. The work in Sol

a et al. (2008) provides solutions to two diﬀerent problems

in multi-camera visual SLAM: automatic self-calibration of a stereo rig while performing SLAM and cooperative

monocular SLAM.

In Harmat et al. (2012), PTAM with multiple cameras mounted on a buoyant spherical airship is reported in a manual

ﬂight experiment. It employs a ground-facing stereo camera pair which can provide metric scale, together with another

camera mounted pointing to the opposite direction using a wide-angle lens. In Tribou et al. (2015), multi-camera visual

SLAM is achieved based on a modiﬁed version of PTAM. This SLAM system allows convergence in pose tracking

and mapping with the absence of accurate metric scale, taking advantages of the Taylor omnidirectional camera model

and a spherical coordinate update method.

In order to use multiple cameras for pose estimation, the extrinsic parameters of those cameras need to be calibrated.

Here we are intereted in the previous work solving this calibration problem for multiple cameras with non-overlapping

ﬁelds of view. Carrera et al. (2011) proposed a SLAM-based automatic calibration scheme for multiple cameras. The

scheme uses global bundle adjustment to optimize the alignment of maps built by diﬀerent visual SLAM instances,

each processing images from one corresponding camera. The proposed solution computes the relative 3D poses among

cameras up to scale. A more recent work can be found in Heng et al. (2014a), which uses a computationally expensive

SLAM system to accurately map an infrastructure before the extrinsic calibrations of multi-camera systems. During a

calibration process, 2D-3D correspondences among visual features in the current scene and in the previously known

map are used for tracking the camera poses based on a Perspective-n-point (PnP) method. Then camera extrinsics are

optimized via non-linear reﬁnement. Self-calibration of multiple stereo cameras and IMU is further achieved in Heng

et al. (2014b), which also achieved multi-camera visual SLAM based on the generalized camera model. At lease one

set of stereo cameras is required in this work. Real-time loop closing runs on-board of MAVs is achieved in both Yang

et al. (2016) and Heng et al. (2014b)

剩余34页未读，继续阅读

评论收藏

内容反馈

JinLn_

粉丝: 221
资源: 8

Multi-camera visual SLAM for autonomous navigation of micro aeri...

最新资源

Multi-camera visual SLAM for autonomous navigation of micro aeri...

Robot SLAM and Navigation with Multi-Camera Computer Vision(多相机SLAM)

MULTICOL-SLAM - A MODULAR REAL-TIME MULTI-CAMERA SLAM SYSTEM.pdf

Multi-robot-SLAM-Code-master_slam_multirobotslam_多机器人_multirobot

Visual-Inertial Monocular SLAM with Map Reuse.pdf

A framework for multi-session RGBD SLAM in low dynamic workspace.pdf

On-Manifold Preintegration for Real-Time Visual-Inertial Odometry.pdf

《ORB-SLAM2源码解析》学习手册v1.0-对外.pdf.pdf

Fast and Robust Initialization for Visual-Inertial SLAM.pdf

毕业设计多四旋翼飞行器SLAM-Multi-Quadrotor源码.zip

Introduction to Autonomous Mobile Robots.pdf

LSD-SLAM Large-Scale Direct Monocular SLAM.pdf

High-Precision, Consistent EKF-based Visual-Inertial Odometry.pdf

Active-ORB-SLAM2-master.rar

CVPR2018_Oral_论文合集_人工智能_机器学习

3D-Visual-GPS-SLAM.zip

A Review of Visual-LiDAR Fusion based Simultaneous Localization and Mapping

RTAB-MAP开源视觉-激光-里程计SLAM代码

GCNv2-SLAM-for-cpu-配置流程记录.zip

Large-Scale Direct SLAM for Omnidirectional Cameras.pdf

基于迭代误差状态卡尔曼滤波(IESKF)的Livox-IMU车载SLAM系统C++源码.zip

查红彬-Flow-based Learning for SLAM-2019.pdf

机器人视觉-移动机器人-VS-SLAM-ORB-SLAM2-深度学习目标检测-yolov3-行为检测-MVision.zip

2018Sonar Visual Inertial SLAM of Underwater Structures.pdf

《ORB-SLAM2源码解析》学习手册v1.0-对外.pdf

SLAM-Multi-Robot多机器人SLAM系统设计-优质项目实战.zip

Academic+Phrasebank+2021+Edition+_中英文对照.pdf

微信小程序源码-作品集展示微信小程序-微信端-毕业设计源码-期末大作业.zip

基于python的超市管理系统的设计与实现毕业论文+项目文档源码

DeepSeek-R1技术报告

1000套计算机毕业设计带源码

最新资源