当前位置:网站首页>[rangenet + + interpretation] fast and accurate lidar semantic segmentation

[rangenet + + interpretation] fast and accurate lidar semantic segmentation

2021-09-15 04:07:08 AI bacteria

 Insert picture description here

One 、RangeNet++ Introduce

RangeNet++ Published in 2019 year , It is a semantic segmentation network based on laser point cloud , It can be applied in the field of automatic driving in real time . In order to get accurate results , A new post-processing algorithm is proposed , The algorithm can deal with the problems caused by intermediate representation , For example, discretization error and fuzzy CNN Output . Experiments show that , This method is superior to the most advanced method at that time , At the same time, it can work in a single embedded system GPU Run online on .

 Insert picture description here

Two 、 Put forward the background

An important task in semantic scene understanding is semantic segmentation . Semantic segmentation assigns a class label to each data point in the input mode . In this paper , The author explicitly solves the problem of rotation 3D LiDAR Semantic segmentation problem , For example, commonly used Velodyne Scanner . Unfortunately , Now it can be used for LiDAR Most of the most advanced methods of data semantic segmentation either do not have enough representation ability to deal with tasks , Either the computational cost is too high to run at frame rate . This makes them unsuitable for supporting the task of self driving cars , Solving these problems is the goal of this work .

The main contribution of this paper is to propose a method for accurate 、 Fast 、 Limited to LiDAR A new method of semantic segmentation . Through the analysis of the input point cloud Spherical projection Operate to achieve this , That is, similar to the distance image 2D The image shows , So use rotation LiDAR The way the sensor detects points . This method can use any CNN As the backbone, it infers the complete semantic segmentation of each pixel of the image . This leads to an effective method , But may result from discretization or fuzziness CNN Problems caused by output .

This article uses semantics Rebuild the original point These problems have been effectively solved , Without discarding any points in the original point cloud , Regardless of image-based CNN What's the resolution of . This post-processing step also runs online , Operate on the image representation , And tailored to improve efficiency . This method can calculate for each point in a constant time Nearest neighbor , And based on GPU The calculation of . This allows us to infer more accurately and faster than the frame rate of the sensor LiDAR Complete semantic segmentation of point cloud . Because this method is different from any range image-based method CNN Run with the trunk , So we call it RangeNet.

All in all , This paper puts forward three key propositions :

  • Semantic segmentation of LIDAR point cloud accurately , Significantly beyond the existing technology ;
  • Infer semantic labels for the complete original point cloud , No matter what CNN What is the level of discretization used in , Avoid discarding points ;
  • On an embedded computer that can be easily installed in a robot or vehicle , And Velodyne The frame rate of the scanner works .

3、 ... and 、RangeNet++ Method

The goal of this paper is to achieve accurate and fast semantic segmentation of point cloud , So that the autonomous machine can make decisions in time . In order to achieve this segmentation , A projection based method is proposed 2D CNN Process the input point cloud , And use each laser scanned range Images to perform semantic reasoning .

RangeNet++ The principle is divided into four steps , Pictured 2 Shown . These four steps are discussed in detail in the following sections :

  • (A) Converts the input point cloud to a distance image representation , namely range Images ;
  • (B)2D Image complete convolution semantic segmentation ;
  • (C) Restore all points from the original point cloud 2D To 3D Semantic transformation of , Regardless of the distance image discretization used ;
  • (D) Based on effective distance image 3D Post process to clear the point cloud in the point cloud , Use... Based running at all points GPU Fast kNN Search for , Eliminate unwanted discretization and reasoning artifacts .

 Insert picture description here

(1) Point cloud to range Image conversion

Multiple LiDAR sensor ( for example Velodyne sensor ) The original input data is represented in a manner similar to a distance image . Each column represents the distance measured by the laser rangefinder array at a point in time , Each row represents a different steering position for each rangefinder , These range finders transmit at a constant rate . However , In high-speed vehicles , The speed of this rotation is not enough to ignore this “ Rolling curtains ” The tilt caused by behavior . In order to obtain a more geometrically consistent representation of the environment at each scan , We must consider the movement of vehicles , As a result, the point cloud no longer contains the distance measurement of each pixel , But multiple measurements containing some other pixels . In order to get complete LiDAR Accurate semantic segmentation of point cloud , Our first step is to convert each de skewed point cloud into range Express .

therefore , Firstly, the point cloud is represented in spherical coordinate system , And then convert to Range Images . from 3D To 2D The conversion process of is shown in the following formula :
 Insert picture description here

(2) Full convolution segmentation network

After spherical mapping, we get 2D Of Range Images , Through the designed full convolution segmentation network , Semantic segmentation , To get 2D The segmentation result at the corresponding position of the image . The segmentation network used in this paper is a common full convolution segmentation network structure , The difference from the traditional image field is , Only right Range Graphic W Down sampling in direction ,H The direction remains the same .

 Insert picture description here
During training , The network uses random gradient descent and weighted cross entropy loss for end-to-end optimization . The objective loss function is designed as follows :
 Insert picture description here

(3) from range Image reconstruction point cloud

A common way to map from a distance image representation to a point cloud is to use distance information 、 Pixel coordinates and sensor internal calibration to achieve mapping . However , Because the distance image was originally generated from the point cloud , This may mean removing a lot of... From the original representation 3D spot . When using smaller images to make CNN When your reasoning is faster , This is especially important . for example , take 130 000 Project a point onto [64 × 512] range The scan of the image will only represent 32768 A little bit , The nearest point in the truncated cone of each pixel is sampled . therefore , In order to infer all the original points in the semantic cloud representation , This article uses all of the data obtained during the initial rendering (u, v) Yes , The distance image is indexed using the image coordinates corresponding to each point . This can be done in... Before the next post-processing step occurs GPU The middle is executed at a very fast speed , And it will generate semantic tags for each point in the whole input scan in a lossless way .

(4) Point cloud post processing

This paper presents a fast algorithm running directly in the input point cloud 、 Support GPU Of k Nearest neighbor (kNN) Search for . This enables us to find the closest point in the scan for each point in the semantic point cloud 3D Of k A consensus vote of points . Because in kNN Very common in search , We also set a threshold for search , We call it the deadline , Set the maximum allowable distance of the point considered as the nearest neighbor . Yes k The distance measure for sorting the nearest points can be the absolute difference in the range , Or Euclidean distance . although , We also try to use mitigation as a penalty clause , It doesn't help our experience . from now on , We will consider using the absolute distance difference as the distance to explain the algorithm , But Euclidean distance works similarly , Although the calculation speed is slow .

Four 、 Experimental evaluation

(1) Experimental setup

1) Data sets
choice KITTI Experiment with data sets , The data set consists of more than 43000 This scan consists of , The sequence 00 To 10 More than 21000 Times can be used for training , Sequence 11 To 21 The rest of the scans are used as test sets . This article uses sequence 08 Validation set selected as a super parameter , The method proposed in this paper is trained on the remaining training sequence . Overall speaking , This dataset provides 22 Categories , among 19 Three categories are evaluated on the test set through the benchmark method in this paper .

2) Super parameter selection
RangeNet All hyperparameters of the model are selected and evaluated on the validation set ( Sequence 8). For all trunk training , This article USES the 1 0 − 3 10^{−3} 103 Learning rate of , Every epoch attenuation 0.99, And train 150 individual epoch. For all CNN The trunk , In less than 150 individual epoch Convergence is achieved within . For all the most advanced methods , The super parameter is also selected on the validation set .

3) Evaluation criteria
To evaluate marking performance , We use the common average on all classes Jaccard Exponential or average cross association (IoU) Measure ,mIoU, Given by the following formula :
 Insert picture description here
In order to better evaluate the performance of prediction accuracy , An additional evaluation index is proposed , Call it the boundary IoU. This measure is based on the standard IoU Define... In the same way , But only for subsets defined by additional parameters , This parameter considers how far a point is from the self occlusion of the sensor , This is represented by a change in the label in the range image . This index aims to show how much the algorithm in this paper can help “ Class shadow ” Wrong label projection .

(2) Comparative experiments

The following table shows how to use 21 Layer and the 53 Layer of RangeNet Trunk and 7 Differences between other baseline methods . What this article puts forward RangeNet The baseline , Even without cleaning , For all input resolutions , It remains a reliable benchmark . I also showed our approach RangeNet++ , Including our kNN post-processing , Always better than its untreated RangeNet The counterpart , It shows our kNN The effectiveness of search . And CRF Different ,kNN Cleanup is better for all categories except one , This is also the original SqueezeSeg The conclusion of the paper , Even if the whole IoU Higher .

 Insert picture description here

(3) Ablation Experiment

The second experiment shows the validation set k and S The effect of parameters . about 4 Parameters k、S、σ And each of the cut-off values , We chose a wide range of values , The effect on all input resolutions is evaluated RangeNet53 The reasoning result of the trunk is the result of post-processing . chart 5 Shows the validation set for each parameter set IoU The normalized results , For all kinds of k and S as well as σ And the cut-off value argmax. The results also show that , We can use the small kernel and absolute distance difference as the proxy of Euclidean distance to obtain similar results . This supports our statement , That is, the distance difference is a good representative of the actual distance closer to the midpoint of the image .

 Insert picture description here

(4) Aftertreatment effects

The following figure shows the different distances to the boundary IoU Values and boundaries IoU value . Please note that , Our post-processing method will not only IoU The score increased by a few percentage points , It also significantly improves the boundary distance when the boundary distance parameter value is low IoU fraction . This means that our approach is important to help solve the problem of 2 Label described in section “ bleeding ” or “ shadow ” The situation is particularly useful . III-D. Another important conclusion is , Within the entire boundary distance and IoU in , Using the fast calculation, there is only a marginal difference between the distance difference and the actual Euclidean distance , This supports our statement , That is, it is a good approximation .
 Insert picture description here

(5) The elapsed time

This experiment aims to support our claim , That is, the method can use a single GPU Run online on a mobile platform . surface II Shows the running time of the trunk 、 Different postprocessing distance functions ( The best parameters ) And the total time required .
 Insert picture description here

5、 ... and 、 Conclusion

In this work , This paper presents a fast and accurate framework , For rotating LiDAR The point cloud recorded by the sensor is semantically segmented . The main contribution of this paper is a novel deep learning support method , This method uses range Images and 2D Convolution , And a novel 、GPU Accelerated post-processing method , Throughout LiDAR Consistent semantic information is recovered in the reasoning process of scanning .

Experimental evaluation shows that , In this paper, the improved algorithm is run on the distance image 2D depth CNN stay LiDAR The performance of point cloud semantic segmentation is better than the most advanced technology . Besides , efficient 、 Support GPU The post-processing can recover the important boundary information lost in the process of laser scanning de skewing 、 Lossy discretization into proxy representation and reasoning through hourglass to further improve these results .

Overall speaking , The method proposed in this paper is superior to the existing technology in accuracy and running time , The sensor redundancy toward automatic driving vehicle and robot semantic segmentation is a step forward .

版权声明
本文为[AI bacteria]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/09/20210909111002642k.html

随机推荐