当前位置:网站首页>Ttvos: lightweight video object segmentation (CS CV) based on adaptive template attention module and time consistency loss

Ttvos: lightweight video object segmentation (CS CV) based on adaptive template attention module and time consistency loss

2020-12-07 19:23:51 Ling Qian

Semi supervised video object segmentation (semi-VOS) It is widely used in many fields . This task is to trace class independent objects through a given partition mask . To do this , Based on optical flow 、 Various methods of online learning and memory networks have been developed . These methods show high accuracy , But because of the slow and complex reasoning time , It's hard to use in practice . To solve this problem , The template matching method is designed to improve the processing speed , At the expense of a lot of performance . We introduce a new semi supervised video object segmentation model based on template matching method and a new time consistency loss model , To reduce performance gaps with heavy models , At the same time, it greatly speeds up the reasoning time . Our template matching methods are divided into short-term matching and long-term matching . Short term matching enhances the positioning ability of the target object , The long-term matching improves the details and shape changes of the target object through the new adaptive template attention module . however , Long term matching when updating templates , Due to the inflow of past estimates , Can lead to error propagation . To solve this problem , We also propose the concept of loss of time consistency , For better time consistency, the concept of transition matrix is used between adjacent frames . Our model is in DAVIS16 In the benchmark test, we use 73.8 FPS The speed of getting 79.5% Of J&F fraction .

Original title :TTVOS: Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss

original text :Semi-supervised video object segmentation (semi-VOS) is widely used in many applications. This task is tracking class-agnostic objects by a given segmentation mask. For doing this, various approaches have been developed based on optical flow, online-learning, and memory networks. These methods show high accuracy but are hard to be utilized in real-world applications due to slow inference time and tremendous complexity. To resolve this problem, template matching methods are devised for fast processing speed, sacrificing lots of performance. We introduce a novel semi-VOS model based on a temple matching method and a novel temporal consistency loss to reduce the performance gap from heavy models while expediting inference time a lot. Our temple matching method consists of short-term and long-term matching. The short-term matching enhances target object localization, while long-term matching improves fine details and handles object shape-changing through the newly proposed adaptive template attention module. However, the long-term matching causes error-propagation due to the inflow of the past estimated results when updating the template. To mitigate this problem, we also propose a temporal consistency loss for better temporal coherence between neighboring frames by adopting the concept of a transition matrix. Our model obtains 79.5% J&F score at the speed of 73.8 FPS on the DAVIS16 benchmark.

Original author :Hyojin Park, Ganesh Venkatesh, Nojun Kwak

Original address :https://arxiv.org/abs/2011.04445

Original statement , This article is authorized by the author + Community publication , Unauthorized , Shall not be reproduced .

If there is any infringement , Please contact the yunjia_community@tencent.com Delete .

版权声明
本文为[Ling Qian]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/11/20201119031742518o.html