没有合适的资源?快使用搜索试试~ 我知道了~
Multi-camera visual SLAM for autonomous navigation of micro aeri...
需积分: 25 3 下载量 28 浏览量
2020-03-31
16:47:01
上传
评论
收藏 12.97MB PDF 举报
温馨提示
本文提出了一种视觉同步定位与映射(SLAM)系统,该系统集成了多个摄像机的测量数据,实现了未知复杂环境下微型飞行器(MA-Vs)自主导航的鲁棒姿态跟踪。分析了多摄像机情况下视觉SLAM姿态跟踪和地图细化的迭代优化问题。分析确保了每次优化更新的正确性和准确性。扩展了一个著名的单目视觉SLAM系统,在最终的实现中使用了两个具有非重叠视场(fov)的摄像机。由此产生的视觉SLAM系统使MAV能够在复杂的场景中自主导航。当星载计算能力允许时,这个系统背后的理论可以很容易地扩展到多摄像头配置。对于大规模环境中的操作,我们将得到的视觉SLAM系统修改为一个恒定时间鲁棒的视觉里程计。为了形成一个完整的视觉SLAM系统,我们进一步实现了一个高效的闭环后端。后端维护一个基于关键帧的全局映射,该映射也用于循环闭合检测。提出了一种自适应窗口位姿图优化方法,用于优化全局地图的关键帧位姿,从而修正视觉里程测量中固有的位姿漂移。在自主飞行和人工飞行的实验中,我们证明了所提出的视觉SLAM算法在MA-Vs机载应用中的有效性。将姿态跟踪结果与外部跟踪系统提供的地面真实数据进行比较。
资源推荐
资源详情
资源评论


























Accepted Manuscript
Multi-camera visual SLAM for autonomous navigation of micro aerial
vehicles
Shaowu Yang, Sebastian A. Scherer, Xiaodong Yi, Andreas Zell
PII: S0921-8890(15)30217-7
DOI: http://dx.doi.org/10.1016/j.robot.2017.03.018
Reference: ROBOT 2817
To appear in: Robotics and Autonomous Systems
Please cite this article as: S. Yang, et al., Multi-camera visual SLAM for autonomous navigation of
micro aerial vehicles, Robotics and Autonomous Systems (2017),
http://dx.doi.org/10.1016/j.robot.2017.03.018
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form.
Please note that during the production process errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.

Multi-Camera Visual SLAM for Autonomous Navigation of
Micro Aerial Vehicles
Shaowu Yang
1,2 ∗†
, Sebastian A. Scherer
3
, Xiaodong Yi
1,2
and Andreas Zell
3
1
State Key Laboratery of High Performance Computing (HPCL), National University of Defense Technology.
2
School of Computer, National University of Defense Technology, Changsha, China.
3
Department of Computer Science, University of T
¨
ubingen, T
¨
ubingen, Germany
Abstract
In this paper, we present a visual simultaneous localization and mapping (SLAM) system which in-
tegrates measurements from multiple cameras to achieve robust pose tracking for autonomous navi-
gation of micro aerial vehicles (MAVs) in unknown complex environments. We analyze the iterative
optimizations for pose tracking and map refinement of visual SLAM in multi-camera cases. The
analysis ensures the soundness and accuracy of each optimization update. A well-known monocular
visual SLAM system is extended to utilize two cameras with non-overlapping fields of view (FOVs)
in the final implementation. The resulting visual SLAM system enables autonomous navigation
of an MAV in complex scenarios. The theory behind this system can easily be extended to multi-
camera configurations, when the onboard computational capability allows this. For operations in
large-scale environments, we modify the resulting visual SLAM system to be a constant-time robust
visual odometry. To form a full visual SLAM system, we further implement an efficient back-end
for loop closing. The back-end maintains a keyframe-based global map, which is also used for
loop-closure detection. An adaptive-window pose-graph optimization method is proposed to refine
keyframe poses of the global map and thus correct pose drift that is inherent in the visual odome-
try. We demonstrate the efficiency of the proposed visual SLAM algorithm for applications onboard
of MAVs in experiments with both autonomous and manual flights. The pose tracking results are
compared with ground truth data provided by an external tracking system.
1 Introduction
In the last decade, we have seen a growing interest in micro aerial vehicles (MAVs) from the robotics community.
One of the reasons for this trend is that MAVs are potentially able to efficiently navigate in complex 3D environ-
ments with different types of terrains, which might be inaccessible to ground vehicles or large-scale unmanned aerial
vehicles (UAVs), e.g. in an earthquake-damaged building (Michael et al., 2012). A basic requirement for MAVs to
autonomously operate in such environments is their robust pose tracking abilities, which is still a challenging task
when the environment is previously unknown and external signals for providing global position data are unreliable.
Meanwhile, if a map of the environment can be built, it will be able to provide support to path planning of autonomous
navigation of the MAV (Schauwecker and Zell, 2014). Recently, more focus has been on using onboard visual solu-
tions to address these issues, especially using visual simultaneous localization and mapping (SLAM) systems.
Although we have seen successful applications of visual SLAM on ground vehicles (Cummins and Newman, 2008;
Strasdat et al., 2011), there are more challenges in using visual SLAM to enable autonomous navigation of MAVs.
∗
corraddr:shaowu.yang@nudt.edu.cn
†
This work was mainly carried out at the Department of Computer Science, University of T
¨
ubingen. Project 61403409 supported by NSFC.

Figure 1: Our MAV platform, with two cameras facing two different directions: downward (in green ellipse) and
forward (in red ellipse).
First, the payload of an MAV is rather limited. This usually leads to limited onboard computational capability, which
requires the visual SLAM system to be very efficient, especially in large-scale operations. Second, due to the 3D
maneuverability of an MAV, robustness of pose tracking is highly desired. When pose tracking fails, unlike a ground
vehicle which may be able to maintain stability simply by ceasing moving, an MAV is likely to run into catastrophic
situations with its position control getting lost.
In order to achieve more robust pose tracking of visual SLAM for MAVs, previous work has made much effort in
fusing data from multi-modal sensors, e.g. fusing inertial measurements (Shen et al., 2013b; Weiss et al., 2013; Li
and Mourikis, 2013; Hesch et al., 2014). Recently, the robotics community has shown a growing interest in improving
the performance of visual SLAM by utilizing multiple cameras (Lee et al., 2013a; Yang et al., 2014b; Heng et al.,
2014b). Pose tracking of a monocular visual SLAM system may easily fail in complex environments when very
limited number of visual features can be observed, due to its limited field of view. This also applies to typical stereo
visual SLAM, which employs cameras looking in one specific direction. The FOV can be enlarged by using a lens
with a wider viewing angle (even fish-eye lens), but at the cost of suffering from larger lens distortion and loss of
environmental details, due to a smaller angular camera resolution. This also applies to catadioptric omnidirectional
vision systems (Lu et al., 2011). Another type of omnidirectional vision systems combines multiple cameras into one
vision system, maintaining a single-viewpoint projection model. However, these cameras need to be very precisely
configured within relatively heavy mechanical systems in order to preserve this model, and thus are not flexible enough
for MAV applications. Obviously, larger effective FOV of a vision system can be obtained by integrating multiple
cameras in it. This implies that better pose tracking robustness could be achieved by extending the monocular visual
SLAM to utilize measurements from multiple cameras. The challenge is how to utilize those cameras in SLAM
efficiently and flexibly.
In this paper, we investigate multi-camera visual SLAM for robust pose tracking and environmental mapping. In the
proposed visual SLAM system, multiple cameras can be mounted pointing to different directions, so that more reliable
visual features can be observed. Our method allows a SLAM system to integrate images captured from various useful
perspectives without requiring the cameras to be mounted in a specific way in order to keep a single-viewpoint model.
This makes the configuration of cameras very flexible. On the other hand, since multiple cameras no longer preserve
this model, using features from multiple cameras in SLAM is not a trivial issue: The question of how features from
additional cameras can be used in the iterative optimization of a SLAM system needs to be carefully analyzed. Based
on this analysis, we are able to integrate those image features into a single visual SLAM system. This enables our MAV
to achieve more robust pose tracking and to build a map that consists of more interesting regions of the environment.
In the final implementation, we expand the FOV of the vision system by using two cameras mounted looking in two
different directions (forward and downward) to capture more critical views, as shown in Fig. 1. The choice of the count
of cameras is a compromise between tracking robustness and onboard computational capability. We further modify
the above multi-camera visual SLAM system to operate as a robust visual odometry with constant-time cost towards

large-scale explorations. Moreover, we propose an efficient back-end for loop-closure detection and correcting pose
drift that is inherent in the visual odometry by using pose-graph optimization (PGO).
The multi-camera visual SLAM front-end and the back-end were first proposed in our previous conference presen-
tations in Yang et al. (2014b) and Yang et al. (2016), respectively. In this paper, we complete the theory of our
multi-camera visual SLAM system with more detailed and systematical analysis, and provide further evaluations of
the system in the experimental results.
The remainder of this paper is organized as follows. Related work on visual SLAM for MAVs and multi-camera
visual SLAM is reviewed in Sec. 2. In Sec. 3, we present the analysis on optimizations in multi-camera SLAM, and
the implementation of the visual odometry. Then we present our SLAM back-end for managing the global map and
loop closing in Sec. 4. The performance of the proposed SLAM system is evaluated in the experiments in Sec. 5.
Finally, in the last section, we provide the summary and discussion of this work.
2 Related Work
In recent years, real-time visual SLAM has been achieved. Two methodologies have become predominant in visual
SLAM (Scaramuzza and Fraundorfer, 2011): filtering methods which fuse information from all past measurements in a
probability distribution (Davison et al., 2007; Eade and Drummond, 2007), and keyframe-based methods, like parallel
tracking and mapping (PTAM) (Klein and Murray, 2007) and Linear MonoSLAM (Zhao et al., 2014). PTAM orga-
nizes its map in keyframes and use nonlinear optimizations for pose tracking and map refinement. Linear MonoSLAM
adopts an efficient sub-map scheme and utilizes bundle adjustment (BA) only in building initial sub-maps, resulting
a linear approach which can achieve a performance very close to that obtained by using global BA. The advantages
and disadvantages of filtering and keyframe-based methods are analyzed in the work of Strasdat et al. (2010a, 2012),
suggesting that, in most modern applications, keyframe optimization gives the most accuracy per unit of computing
time. The progress in visual SLAM has facilitated its applications to aerial robots.
Autonomous navigation of UAVs/MAVs relying on pose estimates from GPS sensors has been well studied in early
researches. Related work usually fuses inertial navigation system (INS) data to aid GPS-sensor data, achieving au-
tonomous navigation in high altitude and long range operations. However, they are not suitable in GPS-denied envi-
ronments, like indoor or in outdoor urban area. Recently, much effort has been focused on developing visual SLAM
systems to enable autonomous flight of MAVs. Related work using stereo cameras, monocular cameras and RGB-D
cameras can be found in the literature.
Autonomous mapping and exploration for MAVs based on stereo cameras is presented in Fraundorfer et al. (2012).
The work in Schauwecker and Zell (2013) features a vision system for autonomous navigation of MAVs using two
pairs of stereo cameras, with stereo triangulation adding constraints to bundle adjustment in PTAM. A stereo setup
yields metric scale information of the environment. However, those systems have difficulties in using distant features
since they triangulate those feature points based on their short baselines. Stereo visual odometry and SLAM systems
may degenerate to the monocular case when the distance to the scene is much larger than the stereo baseline. In this
case, stereo vision becomes ineffective and monocular methods must be used (Scaramuzza and Fraundorfer, 2011).
In Achtelik et al. (2011), PTAM is used to provide position estimates for an MAV, while fusing data from an air
pressure sensor and accelerometers to estimate the unknown metric scale factor of the monocular vision system.
The work in Weiss and Siegwart (2011) presents a visual-inertial data fusion method based on the EKF. It is further
implemented in Weiss et al. (2012) for autonomous navigation of MAVs using inertial data and visual pose estimates
from a modified PTAM system. The scale drift of the monocular PTAM system has been considered in the EKF
framework.
A vision-based system combining advantages of both monocular vision and stereo vision is developed in Shen et al.
(2013b), which uses a low frame-rate secondary camera to extend a high frame-rate forward facing camera that is
equipped with a fisheye lens. It can provide robust state estimates for a quadrotor by fusing the onboard inertial data.

The resulting vision system mainly relies on monocular vision algorithms, while being able to track metric scale by
stereo triangulation. Since the two cameras are configured in a stereo setup, the field of view of the vision system is
not expanded. The improved work in Shen et al. (2013a) enables a quadrotor to autonomously travel at speed up to 4
m/s, and allows roll and pitch angles exceeding 20 degrees in 3D indoor environments.
In Huang et al. (2011), autonomous flight of an MAV is enabled by the proposed SLAM system using an RGB-
D camera. A visual odometry is used for real-time local state estimation, and integrated with the RGBD-Mapping
(described in Henry et al. (2014)) to form the SLAM system. An efficient RGB-D SLAM system is described in
Scherer and Zell (2013), which enables an MAV to autonomously fly in an unknown environment and create a map of
its surroundings. In this work, sparse optical flow is used for feature matching, which is of advantage when motion
blur may result in very limited number of local features being detected. The work in Valenti et al. (2014) perform
RGB-D visual odometry on an MAV to enable its autonomous flight. 3D occupancy grid map is then built for path
planning of the MAV.
Previous work on developing multi-camera systems can mainly be found in applications of surveillance and object
tracking (Collins et al., 2000; Krumm et al., 2000). More relevant related work appears in the context of structure from
motion (SFM). The work in Pless (2003) presents a theoretical treatment of multi-camera systems in SFM deriving
the generalized epipolar constraint. The work in Frahm et al. (2004) proposes a virtual camera as a representation of a
multi-camera system for pose estimation. A structure-from-motion scheme is achieved using multiple cameras in this
work.
A number of multi-camera systems for pose estimation of mobile robots can be found in the literature. In Ragab
(2008), pose estimation of a mobile robot is solved by placing two back-to-back stereo pairs on the robot using the
Extended Kalman Filter (EKF). The work in Lee et al. (2013a) adopts a generalized camera model described in
Pless (2003) for a multi-camera system, to estimate the ego-motion of a self-driving car using a 2-Point RANSAC
algorithm. This system allows point correspondences among different cameras. In Lee et al. (2013b), pose-graph
loop-closure constraints are computed. The relative pose between two loop-closing pose-graph vertices is obtained
from the epipolar geometry of the multi-camera system. Kaess and Dellaert (2006) presented a visual SLAM system
with a multi-camera rig using Harris corner detector (Harris and Stephens, 1988). In their further work in Kaess and
Dellaert (2010), a Bayesian approach to data association is presented taking into account moving features which can be
observed by cameras under robot motion. The work in Sol
`
a et al. (2008) provides solutions to two different problems
in multi-camera visual SLAM: automatic self-calibration of a stereo rig while performing SLAM and cooperative
monocular SLAM.
In Harmat et al. (2012), PTAM with multiple cameras mounted on a buoyant spherical airship is reported in a manual
flight experiment. It employs a ground-facing stereo camera pair which can provide metric scale, together with another
camera mounted pointing to the opposite direction using a wide-angle lens. In Tribou et al. (2015), multi-camera visual
SLAM is achieved based on a modified version of PTAM. This SLAM system allows convergence in pose tracking
and mapping with the absence of accurate metric scale, taking advantages of the Taylor omnidirectional camera model
and a spherical coordinate update method.
In order to use multiple cameras for pose estimation, the extrinsic parameters of those cameras need to be calibrated.
Here we are intereted in the previous work solving this calibration problem for multiple cameras with non-overlapping
fields of view. Carrera et al. (2011) proposed a SLAM-based automatic calibration scheme for multiple cameras. The
scheme uses global bundle adjustment to optimize the alignment of maps built by different visual SLAM instances,
each processing images from one corresponding camera. The proposed solution computes the relative 3D poses among
cameras up to scale. A more recent work can be found in Heng et al. (2014a), which uses a computationally expensive
SLAM system to accurately map an infrastructure before the extrinsic calibrations of multi-camera systems. During a
calibration process, 2D-3D correspondences among visual features in the current scene and in the previously known
map are used for tracking the camera poses based on a Perspective-n-point (PnP) method. Then camera extrinsics are
optimized via non-linear refinement. Self-calibration of multiple stereo cameras and IMU is further achieved in Heng
et al. (2014b), which also achieved multi-camera visual SLAM based on the generalized camera model. At lease one
set of stereo cameras is required in this work. Real-time loop closing runs on-board of MAVs is achieved in both Yang
et al. (2016) and Heng et al. (2014b)
剩余34页未读,继续阅读
资源评论


JinLn_
- 粉丝: 221
- 资源: 8
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- 人工智能前沿专题 – 大语言模型基础导论 DeepSeek核心技术简介.pdf
- 实用的DeepSeek提示词模板 .docx
- 基于springboot框架的Javaweb社区医院信息管理平台(完整源码+数据库sql文件+项目文档+Java项目编程实战+编程练手好项目).zip
- 深度解读DeepSeek:部署、使用、安全.pdf
- 高强螺栓计算详解及软件实现 VDI 2230 重在落地和应用,不是讲单说理论 ,高强螺栓计算详解与软件实现策略:基于VDI 2230标准的实践应用与落地操作指南,高强螺栓计算详解及软件实现:实践指南
- 使用DeepSeek赋能家庭教育.pdf
- 算力突围:DeepSeek搅动AI芯片格局.pdf
- 王炸-+Deepseek【小红书运营】高级指令.pdf
- 我们该如何看待DeepSeek.pdf
- 薪酬工作DeepSeek提示词.pdf
- 银行业DeepSeek大模型应用跟踪报告.pdf
- 详解DeepSeek核心技术.pdf
- 招聘工作DeepSeek提示词.pdf
- 组织发展DeepSeek提示词.pdf
- u7iccmab.dll
- 超全面的平面手性COMSOL 光学仿真,BIC 驱动的最大平面手性,包含能带,Q 因子,正入射斜入射琼斯矩阵透射谱,动量空间(k 空间)(布里渊区)偏振场分布,改变不对称因子CD变化图 下图是仿真文
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制
