红外小目标检测领域的AGPCNet模型：基于注意力机制与金字塔上下文网络的方法资源-CSDN文库

版权申诉

143 浏览量 2025-01-14 21:04:23 上传评论收藏 5.15MB PDF 举报

内容概要：本文介绍了一种称为注意引导金字塔上下文网络（AGPCNet）的数据驱动方法，旨在解决复杂背景环境下对红外小目标进行高精度检测的问题。针对传统方法难以充分利用特征像素之间的关联及其表示能力不足的问题，提出三个创新点：1）注意引导上下文块（AGCB），结合局部语义关联（LSA）和全局上下文注意力（GCA），分别估算补丁内部及不同尺度间的特征关联，凸显小目标并抑制杂乱背景；2）多尺度上下文融合模块（CPM），将AGCB应用于多个尺度上，并整合上下文信息，以改善特征表示；3）非对称融合模块（AFM），用于上采样阶段合并浅层和深层语义，保持更多目标细节信息。文中实验结果显示，在三组公开数据集上的测试性能均显著优于现有的最先进模型。适用地点及目标：AGPCNet主要适用于需要精确探测红外区域小物体的系统如海洋救援、制导导弹等。研究不仅提高了检测率同时减少了误报。其他说明：作者们通过对每个模块进行消融实验证明了各部分的有效性和合理性，并通过对比实验展示了相对于多种经典基准模型的巨大优势。此外，该论文提供了开源代码。适合人群：对深度学习尤其是计算机视觉领域有一定理解的研究人员和技术人员，希望从事复杂背景下小目标检测工作的工程师。使用场景及目标：①适用于需要在各种干扰条件下（如云层、空气湍流等）准确定位并提取弱小红外信号的任务；②有助于减少虚假警报率，提升真实探测质量。阅读建议：文章深入探讨了模型设计思路以及各个组件的功能细节。为了更好地理解这些内容，建议读者首先掌握基本的神经网络概念和技术，比如卷积神经网络的工作原理以及常见的激活函数特性等基础知识，以便能够跟上讨论并深入了解本课题的具体实施情况。此外，建议尝试复现一些提供的实验设置来加深理解和实践操作经验。

资源推荐

资源详情

资源评论

Attention-Guided Pyramid

Context Networks for

Detecting Infrared Small Target

Under Complex Background

TIANFANG ZHANG

University of Electronic Science and Technology of China, Chengdu,

China

LEI LI

University of Copenhagen, Kobenhavn, Denmark

SIYING CAO

TIAN PU

ZHENMING PENG

University of Electronic Science and Technology of China, Chengdu,

China

Infrared small target detection techniques remain a challenging

task due to the complex background. To overcome this p roblem, by

exploring context information, this research presents a data-driven ap-

proach called attention-guided pyramid context network (AGPCNet).

Speciﬁcally, we design attention-guided context block and perceive

pixel correlations within and between patches at speciﬁc scales via

local semantic association and global context attention, respectively.

Then, the contextual information from multiple scales is fused by

Manuscript received 11 April 2022; revised 12 October 2022; accepted 14

January 2023. Date of publication 23 January 2023; date of current version

9 August 2023.

DOI. No. 10.1109/TAES.2023.3238703

Refereeing of this contribution was handled by K. Peter Judd.

This work was supported in part by the Natural Science Foundation of

Sichuan Province of China under Grant 2022NSFSC40574, in part by the

Sichuan Science and Technology Program under Grant 2022YFG0178 and

in part by the National Natural Science Foundation of China under Grant

61775030, and Grant 61571096.

Authors” addresses: Tianfang Zhang, Siying Cao, Tian Pu and Zhen-

ming Peng are with the School of Information and Communication En-

gineering, University of Electronic Science and Technology of China,

Chengdu 611731, China, and also with the Laboratory of Imaging

Detection and Intelligent Perception, University of Electronic Science

and Technology of China, Chengdu, 611731, China, E-mail: (spark-

carleton@gmail.com; caosiying3008@gmail.com; putian@uestc.edu.cn;

zmpeng@uestc.edu.cn); Lei Li is with Department of Computer Sci-

ence, University of Copenhagen, Kobenhavn 1165, Denmark, E-mail:

(lilei@di.ku.dk). (Corresponding author: Zhenming Peng.)

The source codes are available at https://github.com/Tianfang-Zhang/

AGPCNet.

context pyramid module to achieve better feature representation. In

the upsampling stage, we fuse the low and deep semantics through

asymmetric fusion module to retain more information about small tar-

gets. The experimental results illustrate that AGPCNet has achieved

state-of-the-art performance on three available infrared small target

datasets.

I. INTRODUCTION

Infrared small target detection techniques have been

widely used in many applications including early warn-

ing, marine rescue, and accurate guidance [1], [2]. These

applications require accurate information about the target

of interest. Due to the long imaging distances, infrared

targets only occupy very few pixels. The particular imaging

characteristics result in lacking color, shape, and texture

information in the infrared images. Furthermore, complex

backgrounds, structured clutter and random noise also cause

defective impact on target detection [3], [4]. Therefore,

infrared target detection technology still remains a chal-

lenging problem [5], [6] with these disturbing factors.

Infrared small target detection techniques can be divided

into two categories: 1) model driven; 2) data driven. Model-

driven approaches manually design algorithms based on

hypotheses about the physical properties of infrared targets.

These methods can be further split into three subcategories

as follows: 1) background suppression-based approaches

assume that the presence of a target breaks the continuity

of infrared images [7], [8]; 2) human visual system-based

approaches assume that the saliency of a target is only re-

lated to the local contrast of its surroundings [9], [10], [11],

[12]; 3) optimization-based approaches transform the target

detection task into a sparse low-rank tensor decomposition

problem [13], [14], [15], [16], [17], [18]. They heavily rely

on handcrafted features and are, therefore, difﬁcult to detect

robustly when coping with complex scenes.

Data-driven approaches combining neural networks and

public datasets [19], [20] can automatically extract features

to detect targets. Miss detection vs. false alarm condi-

tional generative adversarial network (MDvsFA cGan) [20]

trained generators in both miss detection and false alarm

perspectives using generative adversarial network. Asym-

metric contextual modulation (ACM) [19] designed feature

fusion modules in the encoder-decoder structure for both

low and deep semantic, obtaining a more efﬁcient feature

representation. Attentional local contrast network (ALC-

Net) [21] simulates local contrast through shift operations of

semantic tensors with a view to extracting local information

about the target. Although they have achieved state-of-the-

art performance, most of them have ignored the context

information and correlation between feature pixels. This

leaves the capabilities of the neural network underutilized

and can lead to the loss of detection targets.

It is worth mentioning that investigation about feature

pixel correlation in the ﬁeld of visible images [22], [23], [24]

are paid more attention. However, there is a very signiﬁcant

difference between infrared small targets and visible targets.

The characteristics of infrared data, such as the lack of

color information and the small target area, determine that

the long-range pixel dependence highlighted in previous

4250 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 59, NO. 4 AUGUST 2023

Authorized licensed use limited to: NANJING UNIVERSITY OF AERONAUTICS AND ASTRONAUTICS. Downloaded on September 25,2024 at 13:18:48 UTC from IEEE Xplore. Restrictions apply.

studies cannot be directly applied. Therefore, how to make

sufﬁcient use of context information in infrared small targets

is an urgent problem to be addressed.

In our research, we propose a data-driven approach

called attention-guided pyramid context network (AGPC-

Net). For the contextual information of feature pixels, we

propose attention-guided context block (AGCB), which

perceives the target location within the patches via local se-

mantic association (LSA) and suppresses structured clutter.

The correlation between patches is estimated by global con-

text attention (GCA), which further suppresses point-like

highlighted noise by combining with context information.

For the multiscale representation of deep semantics, we

propose context pyramid module (CPM), which collates and

fuses multiple scales of AGCBs as feature pyramids with the

original feature map. In the upsampling stage, we propose

asymmetric fusion module ( AFM), which takes both low

and deep semantics as input and fuses the extracted attention

with the feature map, preserving as much information as

possible about small targets.

The main c ontributions of this work can be summarized

as follows:

1) We propose AGPCNet for infrared small targets

detection. Contextual information for infrared small

targets is integrated and explored through CPM.

AFM fuses low and deep semantics in an upsampling

stage to retain more useful information.

2) AGCB for perceiving contextual information of in-

frared small targets at a local scale is proposed. It

uses LSA and GCA to estimate the correlation of

pixels within and between patches, which highlights

targets and suppresses the background.

3) Experiments on three datasets and six complex

scenes demonstrate the effectiveness of each module.

Compared with state-of-the-art methods, AGPCNet

has more superior and robust performance for the

complex backgrounds.

The rest of this article is organized as follows. Section II

describes the work related to our ﬁnding; Section III specif-

ically describes AGPCNet and the individual modules in

it; Section IV conﬁrms the effectiveness of the proposed

network with systematic experiments; Finally, Section V

concludes this article.

II. RELATED WORK

Context Modules: With the presentation of a nonlocal

(NL) network [22], context modules are widely used for

tasks such as semantic segmentation and target detection

due to their excellent performance and ease of embedding.

Subsequently, some work set out to improve its perfor-

mance [25], [26]. Dual attention network (DANet) [23] en-

hances feature representation by simultaneously estimating

the correlation between channels and pixels. Global con-

text network (GCNet) [24] combines pixel correlation with

channel attention [27] to achieve promising performance.

Point-wise spatial attention network (PSANet) [28] divides

the c orrelation between two pixels into “collection” and

“distribution” to compute the two processes independently.

Criss-cross network (CCNet) [29] addresses the high com-

plexity of NL by limiting the computation from global to

criss-cross and using recurrent operations to reach the global

correlation, greatly improving computational efﬁciency.

Infrared Small Target Detection Networks: In visible

image data, small target detection is gradually receiving

more attention [30], [31], [32], [33], [34].However,the

uniqueness of infrared imaging method leads to a great

difference between infrared images and visible images in

terms of background and target. Meanwhile, infrared small

target detection technology has been studied for decades

as a key component of modern information systems. In

recent years, the release of public datasets has facilitated

the development of neural networks. The m iss detection

false alarm (MDFA) dataset [20] contains 10 000 training

images and 100 test images, and the single-frame infrared

small target (SIRST) dataset [19] has 427 images in total.

The infrared small target detection task has also been inten-

sively studied as a target pixel segmentation task. Scholars

have also proposed solutions from a variety of perspectives

such as generative adversarial networks [20], cross-layer

feature fusion, and feature tensor local contrast [21]. These

data-driven approaches have achieved promising results, but

they ignore the correlation between feature pixels in neural

networks, which can be explored as critical items to improve

performance.

III. ATTENTION-GUIDED PYRAMID CONTEXT NET-

WORK

We illustrate the details of our general network architec-

ture in this section. CPM, AGCB, and AFM are described

individually. Especially, the AGCB contains a global con-

textual attention submodule and a local semantic association

submodule.

A. Network Architecture and CPM

Fig. 1(a) shows the whole network pipeline. Our input

is an image, then a feature map X with a spatial size of

H × W × C is generated after a deep convolutional neural

network (e.g., ResNet [35]). To reduce missing features

caused from downsampling and preserve small targets in-

formation, we remove the maxpooling layer and set the

stride of the ﬁrst convolutional layer to one. The last three

convolutional blocks perform the downsampling operation,

leading to the feature map X , i.e.,

1/8 of the input image.

Then, we take the feature map X through the CPM to

obtain the integrated feature map C. CPM feeds X into

multiple scales of AGCB in parallel, the scale is denoted

as S ∈{S

,...,S

}. For each scale S

, AGCB integrates

contextual information through the operations in Fig. 1(b),

retaining key information for small targets and obtaining

the feature map A

. We will describe the details of AGCB

and GCA, which will be included in the next section. In

the next step, we concatenate {A

} obtained by AGCB at

multiple scales with the feature map X . Finally, we fuse the

ZHANG ET AL.: AGPCNETS FOR DETECTING INFRARED SMALL TARGET UNDER COMPLEX BACKGROUND 4251

Authorized licensed use limited to: NANJING UNIVERSITY OF AERONAUTICS AND ASTRONAUTICS. Downloaded on September 25,2024 at 13:18:48 UTC from IEEE Xplore. Restrictions apply.

Fig. 1. Illustration of the overall architecture of AGPCNet and its modules. (a) Overview of the proposed AGPCNet for infrared small targets

detection. (b) Illustration of AGCB and GCA.

information from multiple scales after 1 × 1 convolutional

layers to obtain the integrated feature map C.

During the process of upsampling, AFM fuses the low

and deep semantic information with

1/4 and 1/2 spatial

size, respectively. They are both preceded by a bilinear

interpolation operation. Finally, we utilize the segmentation

network by the fused features to predict the detection result

of the ﬁnal infrared small target.

B. Attention-Guided Context Block

Local Semantic Association: As shown in Fig. 1(b),

AGCB consists of two branches. For a given feature map

X ∈

H×W ×C

and scale S, the lower branch (e.g., LSA) di-

vides the feature map into S × S patches of size

× C

and note as X

, i ∈{1, 2 ··· , S

}. For each patch X

,an

updated patch P

is obtained after a NL block as in 1, 2,

where P

is the k element, β is a learnable scalar, and

denotes the element in the kth row and jth column

of the coefﬁcient matrix ω

, ψ(·) θ (·) φ(·) denote the 1 × 1

convolutional layer. Finally, each patch is reassembled into a

feature map P ∈

H×W ×C

in the previously arranged order,

which has a local ﬁeld of view

= β

HW/S



j=1





+ X

(1)

exp













HW/S

j=1

exp











. (2)

Global Context Attention: The upper branch of AGCB

as shown in Fig. 1(b) is used to estimate the dependencies

between each patch P

, which is generally called GCA.

Given scale S, the feature map X is adaptively pooled

to obtain D of spatial size S × S × C, where each point

corresponds to the feature of each patch in LSA. Then, the

correlation between patches is estimated by a NL block.

Subsequently, to increase the pixel-level representational

capability, features are fed to the pixel attention (PA) module

to integrate channel information at each pixel. Finally, as

shown in (3), the guide map G is obtained by the Sigmoid

4252 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 59, NO. 4 AUGUST 2023

Authorized licensed use limited to: NANJING UNIVERSITY OF AERONAUTICS AND ASTRONAUTICS. Downloaded on September 25,2024 at 13:18:48 UTC from IEEE Xplore. Restrictions apply.

剩余11页未读，继续阅读

评论收藏

内容反馈

版权申诉