当前位置:网站首页>Cvpr2021 - an efficient pyramid segmentation attention module PSA

Cvpr2021 - an efficient pyramid segmentation attention module PSA

2021-06-23 23:37:34 CV technical guide (official account)

  Preface :

I shared an article before 《 Following SE,CBAM A new attention mechanism of post-processing Coordinate Attention》, The starting point is SE Only channel attention is introduced ,CBAM The spatial attention only considers the information of local area , Thus, an attention mechanism considering global spatial information is proposed .

In this article , Another attention module based on the same starting point will be introduced , namely Pyramid Split Attention (PSA).PSA With plug and play 、 Light weight 、 Simple and efficient features . The module and ResNet combination , adopt PSA replace ResNet Of bottleneck Medium 3x3 Convolution , Make up the EPSANet.

EPSANet For image recognition , Than SENet top-1acc High 1.93%.PSA Use in Mask RCNN On , Target detection is high 2.7 box AP, Instance segmentation is high 1.7 mask AP.

 

The paper :https://arxiv.org/pdf/2105.14447v1.pdf

Code :https://github.com/murufeng/EPSANet

 

The starting point of this paper is

1. SE Just think about channel attention , Ignoring spatial attention .

2. BAM and CBAM Consider channel attention and spatial attention , But there are two most important drawbacks :(1) It doesn't capture spatial information of different scales to enrich feature space .(2) Spatial attention only considers the information of local area , And you can't build long-distance dependence .

3.  What follows is PyConv,Res2Net and HS-ResNet All for solving CBAM These two shortcomings of , But the amount of calculation is too large .

 

Based on the above three points , This paper proposes Pyramid Split Attention.

 

 

PSA

Main operation : take input tensor From the passage into S Group . Each group is convoluted with different convolution kernel sizes , To get receptive fields of different scales , Extracting information at different scales . Re pass SE modular , Extract the weighted value of each group of channels , Finally, S The weighted values of the groups were calculated softmax Normalized and weighted .

 

Specifically, it will input tensor Divide into S Group , And convolute each group differently SPC The module is shown in the figure below .

 picture

SPC First the input tensor Divide into S Group , The convolution kernel size of each group increases in turn , Such as k=3,5,7,9. When the convolution kernel is large , The amount of calculation is also large , therefore , And then convolute each group , The specific number of groups G = exp(2,(k-1)/2), namely 2 Of (k-1)/2 The next power . When K = 3,5,7,9 when ,G=1,2,3,4.

After convolution of different sizes , Splicing on the channel .

 

after SPC After module ,PSA then SPC The output of the module is through SE Weight Module Gain channel attention , The purpose of this is to get the attention weights of different scale feature maps .

By doing this ,PSA Integrating context information of different scales , And produce better pixel level attention .

Finally, the attention weights of each channel are spliced , Conduct softmax normalization , Yes SPC The output of the module is weighted .

complete PSA The module is shown in the figure below .

Add here pyramid split attention Medium pyramid. stay 《 Feature pyramid technology summary 》 This paper introduces two construction methods of feature pyramid , One is to construct the feature pyramid by convolution of convolution kernels of different sizes . therefore , here PSA Medium Pyramid By SPC Each group of convolution cores of different sizes in the module is constructed by convolution .

 

EPSANet

 

 

As shown in the figure above , take PSA replace ResNet Of bottleneck Medium 3x3 Convolution , Stack a few more of these modules to form EPSANet, there E, refer to efficient.

 

The network design is shown in the figure below .

 picture

 

 

Conclusion

EPSANet For image recognition , Than SENet top-1acc High 1.93%.PSA Use in Mask RCNN On , Target detection is high 2.7 box AP, Instance segmentation is high 1.7 mask AP.

 

With ResNet-50 and ResNet-101 by backbone, Add a variety of attention module image recognition effect comparison

 picture

 picture

 

 

Reply to key words in official account  “ Technical summary ” A summary of the following articles is available pdf.

 

Other articles

Summary of computer vision terminology ( One ) Build the knowledge system of computer vision

Under fitting and over fitting technology summary

Summary of normalization methods

Summary of common ideas of paper innovation

CV Direction of efficient reading English literature method summary

A survey of small sample learning in computer vision    

A brief overview of knowledge distillation    

Optimize OpenCV The reading speed of the video

NMS summary    

Technical summary of loss function

Attention mechanism technology summary    

Feature pyramid technology summary    

Summary of pooling Technology

Summary of data enhancement methods    

CNN Summary of structural evolution ( One ) Classic models

CNN Summary of structural evolution ( Two ) Lightweight model  

CNN Summary of structural evolution ( 3、 ... and ) Design principles

How to look at the future direction of computer vision    

CNN Summary of Visualization Technology ( One )- Visual feature map

CNN Summary of Visualization Technology ( Two )- Convolution kernel Visualization

CNN Summary of Visualization Technology ( 3、 ... and )- Class visualization

CNN Summary of Visualization Technology ( Four )- Visualization tools and projects

版权声明
本文为[CV technical guide (official account)]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/06/20210623233648728u.html

随机推荐