当前位置:网站首页>South China Technology | low bit no data quantization based on generative

South China Technology | low bit no data quantization based on generative

2021-08-08 15:43:16 Author: Juli

Thesis title :Generative Low-bitwidth Data Free Quantization

Source of the paper :ECCV2020

Author of the paper :Shoukai Xu , Haokun Li

Author's unit : South China institute of technology, & Pengcheng Laboratory & Monash University

Download link :https://arxiv.org/abs/2003.03603

Code address :https://github.com/xushoukai/GDFQ

 

【 Recommended reasons 】

For the problem of no data quantization , In this paper, the classification boundary knowledge and data distribution information are mined from the pre trained full precision model , It is used to train a generator to generate pseudo data that can effectively fine tune the quantization model , Introduce... When training the generator BN Layer statistics constrain the data distribution , Is that the generated data is more in line with the real distribution ; Then the generated pseudo data is used to monitor the quantization model for fine-tuning , The performance of the algorithm is superior .

【 primary coverage 】

Problem definition : For the problem of no data , Construct pseudo data , For quantitative models Q, The following optimization objectives are obtained

lately , A kind of No data quantization method ZeroQ Using linear operators with gradient update information to construct pseudo data , With its help, improve quantization performance . however ZeroQ The information used to improve the performance of quantization is not sufficient , There are two problems : First , When it constructs false data Label information is not considered , Thus, the classification boundary knowledge contained in the pre training model is ignored . secondly , it Perform batch normalization statistics on single data points , Instead of normalizing the whole data , The distribution of pseudo data is far away from the real data distribution .

To solve these problems , This paper designs a knowledge matching generator .

Knowledge matching generator

In the process of the training DNN Contains some knowledge information about training data , Such as classification boundary information and distribution information .

Classification boundary information matching :

To generate false data , We introduce a noise vector z And the control condition label y, noise z obey N(0,1) Gaussian distribution of , label . The generator inputs the noise vector a priori z And given label y Mapping generates pseudo data The full precision model should be able to distinguish pseudo data with the same label . Therefore, we introduce the following cross entropy loss

Data distribution information matching :

Besides , In the pre training model BN The layer contains the distribution information of training data . Batch normalization statistics will be dynamically calculated during full precision model training (BNS), Mean and variance . For each batch , Batch normalization (BN) The layer only calculates the statistics entered for the current batch , And accumulate them with momentum . Last , Will get an exponential moving average (EMA) Batch normalization statistics , Then it is used in the process of network verification and testing .

In order to retain BNS Information , The mean and variance of the generated distribution shall be the same as that of the real data distribution . So , We use BNS Information trains a generator G: Is the mean and variance of pseudo data .

Pseudo data driven low bit quantization

With the help of the generator , No data quantization problem can be transformed into supervised quantization problem . therefore , We can use the generated data to quantify the model . However, it is difficult to transfer knowledge from pre training model to quantitative model , This paper presents a pseudo data-driven quantization method , The knowledge from the pre training model is used to solve the optimization problem .

quantitative : We use a simple and effective quantitative method , The weights and activation values are quantified . For a given full precision weight θ And quantization accuracy k, In symmetry k The bit range will θ Quantified as

among θ' Is the discrete value of linear quantization mapping , namely ,L and u Can be set as floating point weights respectively θ The minimum and maximum of .

Optimize : Our goal is to train the quantization model to approximate the full precision model by fine tuning . A fine tuned quantitative model Q False data should be classified correctly . So , We use the cross entropy loss function CE(·,·) To update Q:

secondly , Because the data is synthetic , The traditional loss function based on ordinary classification is not enough to fine tune the model . therefore , We use knowledge distillation to restore the performance of the quantitative model , Therefore, the quantitative model can be through the loss function KL(·,·) Fine tuning

adopt BNS Statistics fine tuning : To stabilize the trim process , We use fixed BNS, The quantitative model always keeps the distribution information of real data , To improve quantization performance .

版权声明
本文为[Author: Juli]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/08/20210808154024828Y.html

随机推荐