深度学习在遥感中的应用综述_遥感深度学习资源-CSDN文库

深度学习

4星 · 超过85%的资源需积分: 49 170 浏览量 2017-10-24 17:35:48 上传评论 12 收藏 9.85MB PDF 举报

资源推荐

资源详情

资源评论

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, IN PRESS. 1

Deep Learning in Remote Sensing: A Review

Xiao Xiang Zhu, Devis Tuia, Lichao Mou, Gui-Song Xia, Liangpei Zhang, Feng

Xu, Friedrich Fraundorfer

Abstract

This is the pre-acceptance version, to read the ﬁnal version please go to IEEE Geoscience and

Remote Sensing Magazine on IEEE XPlore.

Standing at the paradigm shift towards data-intensive science, machine learning techniques are

becoming increasingly important. In particular, as a major breakthrough in the ﬁeld, deep learning has

proven as an extremely powerful tool in many ﬁelds. Shall we embrace deep learning as the key to

all? Or, should we resist a “black-box” solution? There are controversial opinions in the remote sensing

community. In this article, we analyze the challenges of using deep learning for remote sensing data

analysis, review the recent advances, and provide resources to make deep learning in remote sensing

ridiculously simple to start with. More importantly, we advocate remote sensing scientists to bring their

expertise into deep learning, and use it as an implicit general model to tackle unprecedented large-scale

inﬂuential challenges, such as climate change and urbanization.

X. Zhu and L. Mou are with the Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Germany

and with Signal Processing in Earth Observation (SiPEO), Technical University of Munich (TUM), Germany, E-mails:

xiao.zhu@dlr.de; lichao.mou@dlr.de.

D. Tuia was with the Department of Geography, University of Zurich, Switzerland. He is now with the Laboratory of

GeoInformation Science and Remote Sensing, Wageningen University of Research, the Netherlands. E-mail: devis.tuia@wur.nl.

G.-S Xia and L. Zhang are with the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote

Sensing (LIESMARS), Wuhan University. E-mail:guisong.xia@whu.edu.cn; zlp62@whu.edu.cn.

F. Xu is with the Key Laboratory for Information Science of Electromagnetic Waves (MoE), Fudan Univeristy. E-mail:

fengxu@fudan.edu.cn.

F. Fraundorfer is with the Institute of Computer Graphics and Vision, TU Graz, Austria and with the Remote Sensing

Technology Institute (IMF), German Aerospace Center (DLR), Germany. E-mail: fraundorfer@icg.tugraz.at.

The work of X. Zhu and L. Mou are supported by the European Research Council (ERC) under the European Unions

Horizon 2020 research and innovation programme (grant agreement No [ERC-2016-StG-714087], Acronym: So2Sat), Helmholtz

Association under the framework of the Young Investigators Group “SiPEO” (VH-NG-1018, www.sipeo.bgu.tum.de) and China

Scholarship Council. The work of D. Tuia is supported by the Swiss National Science Foundation (SNSF) under the project

NO. PP0P2 150593. The work of G.-S. Xia and L. Zhang are supported by the National Natural Science Foundation of China

(NSFC) projects with grant No. 41501462 and No. 41431175. The work of F. Xu are supported by the National Natural Science

Foundation of China (NSFC) projects with grant No. 61571134.

October 12, 2017 DRAFT

arXiv:1710.03959v1 [cs.CV] 11 Oct 2017

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, IN PRESS. 2

Index Terms

Deep learning, remote sensing, machine learning, big data, Earth observation

I. MOTIVATION

Deep learning is the fastest-growing trend in big data analysis and has been deemed one

of the 10 breakthrough technologies of 2013 [1]. It is characterized by neural networks (NNs)

involving usually more than two layers (for this reason, they are called deep). As their shallow

counterpart, deep neural networks exploit feature representations learned exclusively from data,

instead of hand-crafting features that are mostly designed based on domain-speciﬁc knowledge.

Deep learning research has been extensively pushed by Internet companies, such as Google,

Baidu, Microsoft, and Facebook for several image analysis tasks, including image indexing,

segmentation, and object detection. Recent advances in the ﬁeld have proven deep learning a

very successful set of tools, sometimes even able to surpass human ability to solve highly com-

putational tasks (see, for instance, the highly mediatized Go match between Google’s AlphaGo

AI and the World Go Champion Lee Sedol. Motivated by those exciting advances, deep learning

is becoming the model of choice in many ﬁelds of application. For instance, convolutional neural

networks (CNNs) have proven to be good at extracting mid- and high-level abstract features from

raw images, by interleaving convolutional and pooling layers, (i.e., spatially shrinking the feature

maps layer by layer). Recent studies indicate that the feature representations learned by CNNs

are greatly effective in large-scale image recognition [2–4], object detection [5, 6], and semantic

segmentation [7, 8]. Furthermore, as an important branch of the deep learning family, recurrent

neural networks (RNNs) have been shown to be very successful on a variety of tasks involved

in sequential data analysis, such as action recognition [9, 10] and image captioning [11].

Following this wave of success and thanks to the increased availability of data and computa-

tional resources, the use of deep learning in remote sensing is ﬁnally taking off in remote sensing

as well. Remote sensing data bring some new challenges for deep learning, since satellite image

analysis raises some unique questions that translate into challenging new scientiﬁc questions:

• Remote sensing data are often multi-modal, e.g. from optical (multi- and hyperspectral)

and synthetic aperture radar (SAR) sensors, where both the imaging geometries and the

content are completely different. Data and information fusion uses these complementary

data sources in a synergistic way. Already prior to a joint information extraction, a crucial

October 12, 2017 DRAFT

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, IN PRESS. 3

step is to develop novel architectures for the matching of images taken from different

perspectives and even different imaging modality, preferably without requiring an existing

3D model. Also, besides conventional decision fusion, an alternative is to investigate the

transferability of trained networks to other imaging modalities.

• Remote sensing data are geo-located, i.e., they are naturally located in the geographical

space. Each pixel corresponds to a spatial coordinate, which facilitates the fusion of pixel

information with other sources of data, such as GIS layers, geo-tagged images from social

media, or simply other sensors (as above). On one hand, this fact allows tackling of data

fusion with non-traditional data modalities while, on the other hand, it opens the ﬁeld to new

applications, such as pictures localization, location-based services or reality augmentation.

• Remote Sensing data are geodetic measurements with controlled quality. This enables us

to retrieve geo-parameters with conﬁdence estimates. However, differently from purely

data-driven approaches, the role of prior knowledge about the sensors adequacy and data

quality becomes even more crucial. For example, to retrieve topographic information, even

at the same spatial resolution, interferograms acquired using single-pass SAR system are

considered to be more important than the ones acquired in repeat-pass manner.

• The time variable is becoming increasingly in the ﬁeld. The Copernicus program guarantees

continuous data acquisition for decades. For instances, Sentinel-1 images the entire Earth

every six days. This capability is triggering a shift from individual image analysis to time-

series processing. Novel network architectures must be developed for optimally exploiting

the temporal information jointly with the spatial and spectral information of these data.

• Remote sensing also faces the big data challenge. In the Copernicus era, we are dealing

with very large and ever-growing data volumes, and often on a global scale. For example,

even if they were launched in 2014, Sentinel satellites have already acquired about 25 Peta

Bytes of data. The Copernicus concept calls for global applications, i.e., algorithms must

be fast enough and sufﬁciently transferrable to be applied for the whole Earth surface. On

the other hand, these data are well annotated and contain plenty of metadata. Hence, in

some cases, large training data sets might be generated (semi-) automatically.

• In many cases remote sensing aims at retrieving geo-physical or bio-chemical quantities

rather than detecting or classifying objects. These quantities include mass movement rates,

mineral composition of soils, water constituents, atmospheric trace gas concentrations, and

terrain elevation of biomass. Often process models and expert knowledge exist that is

October 12, 2017 DRAFT

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, IN PRESS. 4

traditionally used as priors for the estimates. This particularity suggests that the so-far dogma

of expert-free fully automated deep learning should be questioned for remote sensing and

physical models should be re-introduced into the concept, as, for example, in the concept

of emulators [12].

Remote sensing scientists have exploited the power of deep learning to tackle these different

challenges and started a new wave of promising research. In this paper, we review these advances.

After the introductory Section II detailing deep learning models (with emphasis put on convolu-

tional neural networks), we enter sections dedicated to advances in hyperspectral image analysis

(Section III-A), synthetic aperture radar (Section III-B), very high resolution (Section III-C, data

fusion (Section III-D), and 3D reconstruction (Section III-E). Section IV then provides the tools

of the trade for scientists willing to explore deep learning in their research, including open codes

and data repositories. Section V concludes the paper by giving an overview of the challenges

ahead.

II. FROM PERCEPTRON TO DEEP LEARNING

Perceptron is the basic of the earliest NNs [13]. It is a bio-inspired model for binary classiﬁ-

cation that aims to mathematically formalize how a biological neuron works. In contrast, deep

learning has provided more sophisticated methodologies to train deep NN architectures. In this

section, we recall the classic deep learning architectures used in visual data processing.

A. Autoencoder models

1) Autoencoder and Stacked Autoencoder (SAE): An autoencoder [14] takes an input x ∈ R

and, ﬁrst, maps it to a latent representation h ∈ R

via a nonlinear mapping:

h = f(Θx + β) , (1)

where Θ is a weight matrix to be estimated during training, β is a bias vector, and f stands for

a nonlinear function, such as the logistic sigmoid function or a hyperbolic tangent function. The

encoded feature representation h is then used to reconstruct the input x by a reverse mapping

leading to the reconstructed input y:

y = f (Θ

h + β

) , (2)

where Θ

is usually constrained to be the form of Θ

= Θ

, i.e., the same weight is used for

encoding the input and decoding the latent representation. The reconstruction error is deﬁned

October 12, 2017 DRAFT

IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, IN PRESS. 5

as the Euclidian distance between x and y that is constrained to approximate the input data x

(i.e., making kx − yk

→ 0). The parameters of the autoencoder are generally optimized by

stochastic gradient descent (SGD).

An SAE is a neural network consisting of multiple layers of autoencoders in which the outputs

of each layer are wired to the inputs of the following one.

2) Sparse Autoencoder: The conventional autoencoder relies on the dimension of the latent

representation h being smaller than that of input x, i.e., M < D, which means that it tends

to learn a low-dimensional, compressed representation. However, when M > D, one can still

discover interesting structures by enforcing a sparsity constraint on the hidden units. Formally,

given a set of unlabeled data X = {x

, x

, · · · , x

}, training a sparse autoencoder [15] boils

down to ﬁnding the optimal parameters by minimizing the following loss function:

E =

i=1

(J(x

, y

; Θ, β) + λ

j=1

KL(ρkˆρ

)) , (3)

where J(x

, y

; Θ, β) is an average sum-of-squares error term, which represents the reconstruc-

tion error between the input x

and its reconstruction y

. KL(ρkˆρ

) is the Kullback-Leibler (KL)

divergence between a Bernoulli random variable with mean ρ and a Bernoulli random variable

with mean ˆρ

. KL-divergence is a standard function for measuring how similar two distributions

are:

KL(ρkˆρ

) = ρ log

ˆρ

+ (1 − ρ) log

1 − ρ

1 − ˆρ

. (4)

In the sparse autoencoder model, the KL-divergence is a sparsity penalty term, and λ controls

its importance. ρ is a free parameter corresponding to a desired average activation

value, and ˆρ

indicates the average activation value of hidden neuron h

over the training samples. Similar to

the autoencoder, the optimization of a sparse autoencoder can be achieved via back-propagation

and SGD.

3) Restricted Boltzmann Machine (RBM) & Deep Belief Network (DBN): Unlike the deter-

ministic network architectures, such as autoencoders or sparse autoencoders, an RBM (cf. Fig. 1)

is a stochastic undirected graphical model consisting of a visible layer and a hidden layer, and

An activation corresponds to how much a region of the image reacts when convolved with a ﬁlter. In the ﬁrst layer, for

example, each location in the image receives a value that corresponds to a linear combination of the original bands and the ﬁlter

applied. The higher such value, the more ‘activated’ this ﬁlter is on that region. When convolved over the whole image, a ﬁlter

produces an activation map, which is the activation at each location where the ﬁlter has been applied.

October 12, 2017 DRAFT

剩余59页未读，继续阅读

评论收藏

内容反馈

Wizholy

2017-12-27

很好的综述文档
yyyayy

2018-01-11

很好的资源
SpatialEquilibrium

2018-05-03

非常不错。
北视界

2018-01-27

很好的资源
giscnu

2018-02-27

就是这篇英文文章Deep Learning in Remote Sensing: A Review，网上都可以免费下载，浪费了我的C币-_-||