当前位置:网站首页>[SNE roadseg interpretation] pavement segmentation network combined with surface normal vector (eccv2020)

[SNE roadseg interpretation] pavement segmentation network combined with surface normal vector (eccv2020)

2021-09-15 04:07:16 AI bacteria

 Insert picture description here

This paper is jointly published by the University of California, San Diego and the Robotics Laboratory of HKUST , Included in ECCV2020. This paper creatively proposes a surface normal estimator (SNE), It is used in pavement segmentation network , bring SNE-RoadSeg Good detection performance is obtained in different data sets .

The paper :http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123750341.pdf
GitHub Warehouse :https://github.com/hlwang1124/SNE-RoadSeg

One 、SNE-RoadSeg brief introduction

Free space detection is an important part of visual perception of self driving car. . Recently in data fusion, convolutional neural networks (CNNs) Efforts have significantly improved semantic driven scene segmentation . You can assume a free space as a ground plane , Points on this ground plane have similar surface normals .

therefore , This paper first introduces a new model —— Surface normal estimator (SNE), It can be used with high accuracy 、 Efficiently from dense depth / Extracting surface normal information from parallax images . Besides , This paper also proposes a method of data fusion CNN structure , be called RoadSeg, It can come from RGB Feature extraction and fusion from image and inferred surface normal information , So as to realize accurate free space detection .

In order to achieve the research purpose , We released a collection under different light and weather conditions , Large scale free space detection data set , Name it Ready-to-Drive(R2D) Road dataset . Experimental results show that , What we proposed SNE The module can make all the most advanced CNN All benefit from free space detection , And our SNE-RoadSeg Get the best overall performance in different data sets .

Two 、 Surface normal estimator SNE

SNE The function of is to convert the input depth map into surface normal vector , And enter it into the network :

 Insert picture description here
How to accurately obtain the surface normal vector in the image has become an important mathematical problem , Let's introduce the push process in detail .

1、 Establish the relationship between camera coordinate system and image pixel coordinate system

among ,p=[x, y], Represents pixel coordinates (x, y);P=[X, Y, Z], Express p The position of the corresponding space point in the camera coordinate system ;K Camera internal reference , Obtained by camera calibration .
 Insert picture description here
2、 Establish point P Normal vector equation at

spot P The normal vector at is (nx, ny, nz), Then the normal vector equation can be expressed as :
 Insert picture description here
from (1)、(2) Simultaneous availability :

Respectively for x, y Finding partial derivatives :
 Insert picture description here
3、 Representation of normal vectors

from (4) Formula can be obtained : Insert picture description here
For arbitrary Q i ∈ N p Q_i \in N_p QiNp, among N p = [ Q 1 , Q 2 , . . . , Q k ] N_p =[Q_1, Q_2, ... ,Q_k] Np=[Q1,Q2,...,Qk], Yes. P Of k Adjacent points . take (5) Plug in (2) in , Can calculate its corresponding n z i n_{zi} nzi
 Insert picture description here
Thus we can see that , The normal vector can be expressed as :
 Insert picture description here

4、 Normal vector of spherical coordinate system

According to the properties of spherical coordinate system , At point P The normal vector at is :
 Insert picture description here
among , An angle can be found : Insert picture description here
By establishing the following relationship , among k Yes. P Of k Adjacent points , n ^ \hat{n} n^ Represents a regular normal vector , n i ˉ \bar{n_i} niˉ Represents the normal vector in the spherical coordinate system .

 Insert picture description here
When E At the very least , explain n i ˉ \bar{n_i} niˉ And n ^ \hat{n} n^ Infinitely close , At this point, the ideal solution can be obtained θ \theta θ. But even :

 Insert picture description here
Available θ \theta θ The expression of :
 Insert picture description here
in summary , The whole calculation process , The representation can be as shown in the figure below :

 Insert picture description here

3、 ... and 、 Pavement segmentation architecture RoadSeg

As shown in the figure below , yes SNE-RoadSeg The overall structure of . On the whole structure , Coding is still used - Decoder structure . differ Unet Yes. , This paper DenseNet The skip connection realizes more flexible feature fusion in the decoder . also , It is considered that skip connection is an unnecessary constraint to force aggregation only on the same scale feature map of encoder and decoder .
 Insert picture description here
It can be seen from the figure above , Extracted RGB And surface normal feature map are merged by element by element summation . And then through DenseNet The fused feature map is fused again in the decoder to restore the resolution of the feature map . stay RoadSeg Last , Use sigmoid Layer generates a probability graph for semantic driving scene segmentation .

We use ResNet Encoder backbone network . say concretely , The initial block consists of a convolution layer 、 Batch normalization layer and ReLU The active layer consists of . then , The maximum pool layer and the four residual layers are used in turn to gradually reduce the resolution and increase the number of characteristic image channels .

ResNet There are five architectures :ResNet-18、ResNet-34、ResNet-50、ResNet-101 and ResNet-152. our RoadSeg Follow and ResNet Same naming rules . Number of signature channels c n c_n cn According to the ResNet Architecture varies . say concretely , about ResNet-18 and ResNet-34,c0–c4 Respectively 64、64、128、256 and 512; about ResNet-50、ResNet-101 and ResNet-152 Respectively 64、256、512、1024 and 2048.

The decoder consists of two different types of modules :(a) Feature extractor Fi,j (b) Upper sampling layer Ui,j. They are tightly connected to achieve flexible feature fusion . The feature extractor is used to extract features from the fused feature map , And ensure that the resolution of the feature map remains unchanged . The upper sampling layer is used to improve resolution and reduce signature channels . The feature extractor and the three convolution layers in the upper sampling layer have the same kernel size 3×3、 Same step size 1 And the same fill 1.

Four 、SNE-RoadSeg Experimental results

The method proposed in this paper is similar to KITTI Five state-of-the-art published on the road benchmark CNN Made a comparison . The experimental results are shown in the figure below :
 Insert picture description here

As shown in the following table , A quantitative comparison is given , It shows that the method proposed in this paper SNE-RoadSeg Achieved the highest MaxF、AP and PRE, and LCCRF Achieved the best REC. The free space detection method in this paper KITTI Ranked second in the road benchmark .
 Insert picture description here

5、 ... and 、 summary

In short , The main contributions of this paper include :

  • Put forward SNE modular , Can from depth / Inferring surface normal information with high accuracy and efficiency in parallax images ;
  • Put forward a proposal called RoadSeg Data fusion of CNN framework , Can integrate RGB And surface normal information for accurate free space detection , And it is superior to all others in detecting driveable road areas CNN.
  • A publicly available synthetic dataset for semantic driving scene segmentation is published .

版权声明
本文为[AI bacteria]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/09/20210909111002651n.html

随机推荐