当前位置:网站首页>[yolox interpretation] anchor free target detector comparable to yolov5!

[yolox interpretation] anchor free target detector comparable to yolov5!

2021-09-15 04:07:17 AI bacteria

 Insert picture description here

One 、YOLOX Introduce

YOLOX stay YOLO A series of work has been done on the basis of series , Its main contribution lies in : stay YOLOv3 On the basis of , Introduced Decoupled Head,Data Aug,Anchor Free and SimOTA Sample matching Methods , Built a anchor-free End to end target detection framework , And reached the first-class detection level .

Besides , What this article puts forward YOLOX-L Model in the video perception challenge (CVPR 2021 Automatic driving Seminar ) Won the first place in . The author also provides support ONNX、TensorRT、NCNN and Openvino Deployment version of .Github The open source address is :
https://github.com/Megvii-BaseDetection/YOLOX

 Insert picture description here

Two 、YOLOX The improvement of

In the past two years , The main progress in target detection academia focuses on anchor free detector , Advanced label assignment policy , And end to end (NMS-free) detector . in consideration of YOLOv4 and YOLOv5 For Anchor Based pipes, there may be some over optimization , The author chooses YOLOv3-Darknet53 As a baseline , On this basis, we improved .

(1) Decoupling head

In target detection , The conflict between classification and regression tasks is a well-known problem . therefore , Decoupling heads for classification and positioning are widely used in most single-stage and two-stage detectors . However , As YOLO The backbone and feature pyramid of the series ( for example ,FPN,PAN), In the process of continuous evolution , Their probes remain coupled , Here's the picture 2 Shown :
 Insert picture description here
Experiments show that , The decoupling detector can accelerate the convergence speed of the model 、 Improve the detection accuracy , At the same time, it will also bring some additional parameters and calculation cost :

1) Replace... With a decoupled head YOLO The head of , The convergence rate is greatly improved . As shown in the figure below :

 Insert picture description here
2) Separated head for End-to-End YOLO Is essential . As can be seen from the table below : Coupling head End-to-End YOLO Test performance AP To reduce the 4.2%, And the decoupling head YOLO The decline was only reduced to 0.8%AP. therefore , Use one as shown in the figure 2 The light decoupling head shown replaces YOLO Detection head .
 Insert picture description here
3) Decoupling the detection head will undoubtedly increase the complexity of operation , But after weighing the gains and losses of speed and performance , The author finally uses 1 individual 1x1 First reduce the dimension of the convolution , And in the classification and regression branches 2 individual 3x3 Convolution , Finally adjust to just add a little parameter ,YOLOX stay s,m,l,x The slight drop in model speed also comes from this . The following table shows V100 On Batch=1 Reasoning time of ,Decoupled Head brought 1.1% Performance improvement of .
 Insert picture description here
On the surface ,Decoupled Head Promoted YOLOX Performance and convergence rate , But deeper , It's for YOLO Integration with detection downstream tasks makes it possible . for example YOLOX + Yolact/CondInst/SOLO , The end-to-end instance segmentation can be realized .

(2) Data to enhance

When the model capacity is large enough , Relative to prior knowledge ( Various tricks,hand-crafted rules ), More posteriori ( data / Data to enhance ) Will have an essential impact . By using COCO Provided ground-truth mask mark , The author in YOLOX I tried Copypaste, The following table shows , stay 48.6mAP Of YOLOX-Large Model , Use Copypaste bring 0.8% The rising point of the market .
 Insert picture description here
however Copypaste The realization of depends on the mask mark , and mask Tagging is a scarce resource in conventional detection services . And because the MixUp and Copypaste With similar mapping behavior , There is no need for mask mark , So the author is in Mixup Modified on , It is closer in principle to Copypaste Of Mixup Data enhancement mode .

Throughout the experiment , The author uses Mosaic And modified Mixup Data enhancement mode . It should be noted that : Before the end of model training 15 individual epoch Turn it off Mosaic and Mixup . because Mosaic + Mixup Generated training picture , Far from the true distribution of natural pictures , also Mosaic A large number of clipping operations will lead to a lot of inaccurate dimension boxes .

(3)Anchor-free

Based on anchors The target detector , In order to obtain the best detection performance , Cluster analysis needs to be carried out before training to determine a set of optimal anchors , But it also brings some problems :

  • The anchor box obtained by clustering can only be used on the data set , It's not universal ;
  • The anchor frame increases the complexity of the detection head and the number of generated results ;
  • Anchor Free The decoding code logic is simpler , More readable .

therefore , Anchor free detector ( such as Fcos、Cornernet etc. ) It has developed rapidly in the past two years . Some work shows that , The performance of anchor free detector is comparable to that of anchor based detector .Anchor-free The mechanism significantly reduces the number of design parameters that need heuristic adjustment and many skills involved ( for example , Anchor cluster 、 Grid sensitive ). Make the detector , In particular, its training and decoding stages become quite simple .

be based on Anchor-free Methods , Change the prediction of each location from 3 Reduce to 1 individual , And let them directly predict 4 It's worth , That is, the two offsets in the upper left corner of the grid , And the height and width of the prediction box . Through such modification, the parameters and parameters of the detector are reduced GFLOP, Make it faster , But better performance 42.9%AP.

(4)SimOTA

Many people may question why Anchor Free Now you can go to YOLO , And the performance increases instead of decreasing ?

In fact, this is closely related to sample matching . And Anchor Free Compare to , Sample matching seems to have little attention in the industry . However, a good sample matching algorithm can naturally alleviate the detection problem of crowded scenes , Alleviate the problem of poor detection effect of objects with extreme aspect ratio , And the imbalance of positive samples of extreme size targets . It may even alleviate the problem of poor detection effect of rotating objects , These problems are essentially sample matching problems .

Actually , The author's previous work OTA The of sample matching is fully considered in 4 Important factors , By modeling sample matching as an optimal transmission problem , The optimal sample matching scheme under global information is obtained . this 4 The factors are :

  • loss/quality/prediction aware: Calculate based on the prediction of the network itself anchor box perhaps anchor point And gt The match of , Different structures are fully considered / Complex models may behave differently , It's a real dynamic Sample matching .
  • center prior : Considering the receptive field , And most of the scenes , The centroid of the target is related to the geometric center of the target , Limit the positive sample to a certain area in the center of the target loss/quality aware Sample matching can solve the problem of unstable convergence .
  • dynamic k: That is, set different positive sample numbers for different goals .
  • global information: There are some anchor box/point At the junction between positive samples 、 Or the junction between positive and negative samples , This kind of anchor box/point Positive and negative division of , Even if it is positive , Whose positive sample should it be , Should fully consider the global information .

It is worth noting that ,OTA The biggest problem is that it will increase by about 20~25 % Extra training time , For every 300epoch Of COCO It's unbearable for training , Besides Sinkhorn-Iter It also takes up a lot of video memory , So in YOLOX On , We took it out OTA The optimal solution process in , Keep the top 4 Before point 3 spot , therefore ,YOLOX The sample matching scheme used is called SimOTA ( That is, simplified OTA ).

3、 ... and 、 Experiment and effect

except DarkNet53, We also tested on other trunks of different sizes YOLOX, On these trunks ,YOLOX Consistent improvements over all corresponding trunks .

(1) Training parameter setting

YOLOX The training settings are basically consistent with the baseline . stay COCO Train 2017 I trained a total of 300 individual epochs. A random gradient descent is used (SGD) Training , The momentum parameter is 0.9. Learning rate :lr x batch_size/64, Initial lr=0.01, And then use cosine attenuation , The weight attenuation coefficient is 0.0005.

(2)YOLOv3 & YOLOX

The experimental results in the table below show that ,YOLOX stay 640×640 Resolution COCO There will be YOLOv3 Of AP Promoted to 47.3%, Much more than at present YOLOv3 Best practices 44.3% Of AP.
 Insert picture description here

(3)YOLOv5 & YOLOX

For a fair comparison , The author adopts YOLOv5 The backbone network of , Including the modified CSPNet、SiLU Activation functions and PAN head . The author also follows its proportional rule to construct YOLOX、YOLOX-M、YOLOX-L and YOLOX-X Model . And YOLOv5 comparison , Our model AP Improved 1.0%-3.0%, Only a little extra time cost is added ( From the decoupled head ).

 Insert picture description here

(4)YOLOX-Tiny & YOLOX-Nano

The author further reduces the model to YOLOX-Tiny, In order to YOLOv4-Tiny Compare . For mobile devices , A deep convolution method is used to construct a 0.91M Parameters and 1.08G FLOP Of YOLOX-Nano Model . As shown in the following table :YOLOX The performance is very good , The model size is even smaller than the corresponding model .
 Insert picture description here

(5) Model size and data enhancement strategy

The author found , On models of different sizes , The appropriate enhancement strategy is different . Although the YOLOX-L Applying blending can make AP Improve 0.9%, But for the YOLOX-Nano Such a small model , It's best to reduce this enhancement .

To be specific , Training a small model YOLOX-S、YOLOX-TINY and YOLOX-Nano when , We took it out MixUp enhance , Weakened mosaic . Such an improvement will YOLOX-Nano Of AP from 24.0% Up to 25.3%.

 Insert picture description here

Four 、 summary

The author in YOLO Based on the series , A series of improvements have been made , Build a new one called YOLOX A high performance Anchor-free detector . Some of the latest advanced detection technologies are adopted , Like decoupling the head 、 No anchor points and advanced label allocation strategy , This makes YOLOX The best balance between speed and accuracy is achieved . It is worth noting that ,YOLOX stay COCO On dataset , take YOLOv3 Improved architecture performance to 47.3%AP, Exceed current best practices 3.0% Of AP.

版权声明
本文为[AI bacteria]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/09/20210909111002655u.html

随机推荐