当前位置:网站首页>Portrait cutout

Portrait cutout

2020-12-08 08:04:12 ShellCollector

 

u 2 net:

https://github.com/ZHKKKe/MODNet

 

How to cut a picture without a green screen ? before , Researchers at the University of Washington propose to replace trimap, You can achieve good matting results without a green screen . But this method needs to process and align the original image and background image , It's not convenient for practical application . In recent days, , City University of Hong Kong and Shang Tang proposed a new method of portrait matting MODNet, No green curtain 、 Just a single image 、 A single model can complete portrait matting in real time .

Portrait matting is to predict an accurate foreground mask (alpha matte), Then use it to extract people from a given image or video . This technology has been widely used , Like photo editing 、 Film re creation, etc . at present , It still needs the help of green screen to get high quality foreground mask in real time .

 

But if there's no green curtain ? At present, most matting methods use predefined trimap As a priori . but ,trimap We need humans to label , The cost is high , And if it's captured with a depth camera , There will also be low accuracy problems . therefore , Some recent work attempts to eliminate the model for trimap Dependence , namely trimap-free Method . for example , The University of Washington proposed  background matting  Methods to separate the background image to replace trimap. Other methods use multiple models to fake trimap Or semantic mask , Then we use it as a priori to predict the future mask . But using the background image as input requires input and alignment of two images , Using multiple models can significantly increase the inference time . These defects make all the above matting methods not suitable for practical applications , Such as camera preview . Besides , Limited by insufficient labeled training data ,trimap-free Methods in practice, we often encounter the problem of domain migration , That is, the model cannot be generalized to real data well .

 

Can you just use one model 、 a sheet RGB Images , To predict the exact foreground mask ? lately , The City University of Hong Kong and Shang Tang have come up with a lightweight network MODNet, It decomposes the portrait matting task into three related subtasks , And through specific constraints to perform synchronous optimization .

 

Let's take a look first MODNet The matting effect of :

MODNet There are two insights behind the model :

 

One , Neural networks are better at learning a set of simple goals , Not a complex target . therefore , Solving multiple matting sub goals can achieve better performance .

 

Two , An explicit supervisory signal is applied to each sub target , Different parts of the model can learn decoupling knowledge , In order to achieve a model to solve all the sub goals .

 

To overcome the problem of domain migration , This study is based on sub goal consistency (SOC) A self-monitoring strategy is proposed , That is to use the consistency between sub targets to reduce the artifacts in the foreground mask . Besides , The study also proposes a single frame delay (OFD) trick This post-processing method , For smoother output in video matting applications .MODNet The frame is shown in the figure below :

comparison trimap-free Method ,MODNet Have the following advantages :

  • MODNet faster : It's designed for real-time applications , The input size is 512 × 512 when ,MODNet stay Nvidia GTX 1080Ti GPU The speed on the is 63 fps;

  • MODNet Got a new SOTA result , The reason lies in :1) Objective decomposition and optimization ;2) Apply a specific supervisory signal to each sub target ;

  • MODNet Have better generalization ability , Thanks to SOC Strategy .

Even though MODNet The results are not better than those based on trimap Methods , But experiments have shown that MODNet It is more stable in practical application , The reason is that it's removed trimap Input . This method challenges the necessity of green screen in real-time portrait matting .

 

The scale or accuracy of the existing open source portrait matting data sets are limited , Many previous studies have conducted model training and validation on private data sets with different levels of quality and difficulty . This makes it difficult to compare different methods . And this study evaluates existing trimap-free Method : All models are trained on the same data set , And from Adobe Matting Validation was performed in the portrait data of the dataset and the new benchmark dataset proposed by the study . The researchers say , The new benchmark they put forward completes high-quality labeling , Diversity is also better than the previous benchmark , So it can reflect the matting performance more comprehensively .

 

All in all , This study proposes a new network architecture MODNet, It can be done in real time trimap-free People are like cutouts . The researchers also proposed two techniques SOC and OFD, send MODNet Can be well generalized to new data areas , Video matting quality is smoother . Besides , The research also constructs a new benchmark data set for portrait matting verification .

 

MODNet Method

 

Methods based on multiple models show that ,「 take trimap-free Matting is seen as trimap forecast ( Division ) Step plus based on trimap The matting steps of 」 Can achieve better performance . This shows that , Neural networks benefit from decomposing complex targets . therefore , The study continues to expand this idea , take trimap-free The target of matting is decomposed into semantic estimation 、 Detail prediction and semantics - Detail fusion, three sub goals . Intuitive to see , The output of semantic estimation is rough foreground mask , Detail prediction generates fine-grained foreground boundaries , And semantics - Detail fusion mixes the features of the two .

 

Pictured 2 Shown ,MODNet There are three branches , Each learns different sub goals through specific constraints . To be specific :

  • Low resolution branches are used to estimate human semantics ( The mask value is the true value of the monitor );

  • The high resolution branch is used to identify the portrait boundary ( The surveillance signal is the transition zone (α ∈ (0, 1));

  • Fusion branches are used to predict the final foreground mask ( The supervisory signal is the whole truth mask ).

experiment

 

The study created a new kind of portrait matting benchmark PPM-100, And compared it with MODNet And the existing portrait matting methods , It also proves that SOC and OFD Strategy for MODNet Adapt to the validity of real data .

 

PPM-100 The benchmark

 

This study proposes a new kind of portrait matting benchmark Photographic Portrait Matting benchmark (PPM-100), contain 100 Well marked 、 People with different backgrounds . Here's the picture 4 Shown , PPM-100 The background of the sample in is more natural 、 The characters have more posture , So the data is more comprehensive .

stay PPM-100 Performance results on

 

Researchers at PPM-100 It's compared with MODNet and FDMPA、LFM、SHM、BSHM、HAtt, The results are shown in the table below 1. It can be seen from it that ,MODNet stay MSE and MAD These two indicators are better than others trimap-free Method , But it's not as good as based on trimap Of DIM Method . take MODNet Change to be based on trimap After the method of , Its performance is better than DIM.

The following figure shows the effect comparison of different methods :

It can be seen from it that ,MODNet Hollow structures can be better handled ( first line ) And hair details ( The second line ), But it will still be difficult to deal with the clothing ( The third line ).

 

Performance results on real data

 

The following figure shows MODNet Matting on real data , As you can see from the diagram SOC It is very important for the generalization ability of the model on real data ,OFD Can further make the output more smooth .

MODNet Not based on trimap, Thus, mistakes can be avoided trimap The problem of . chart 8 It shows MODNet And based on trimap Of DIM The comparison results of the methods are :

Besides , The researchers also compared MODNet And the University of Washington proposed background matting (BM) Method , See chart 9. As you can see from the diagram , When a moving object suddenly appears in the background ,BM The results of the method will be affected , and MODNet It is robust to this kind of disturbance .

  • Thesis link :https://arxiv.org/pdf/2011.11961.pdf

  • Project address :https://github.com/ZHKKKe/MODNet

版权声明
本文为[ShellCollector]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/202012080657596314.html