# Counter attack and defense (1): counter sample generation in image domain

2022-06-23 18:03:41Inge

# 1 introduce

Compared with other fields , The generation of countermeasure samples in the image field has the following advantage
1） Real images and false images are intuitive to the observer ;
2） The structure of image data and image classifier is relatively simple .
primary coverage ： Take fully connected network and convolutional neural network as examples , With MNIST、CIFAR10, as well as ImageNet Based on the sample , The research is based on Avoid confrontation , Including white box 、 Black box 、 Gray box , And physical attack image countermeasure sample generation .

# 2 White box attack

The attacker received a message C C With the victim sample (victim sample) ( x , y ) (x,y) after , The goal is to synthesize an image that is perceptually similar to the original image , But false images that may mislead the classifier to give wrong prediction results ：
find x ′ Satisfy ∥ x ′ − x ∥ ≤ ϵ ,   example Such as C ( x ′ ) = t ≠ y , (1) \tag{1} \text{ find }x'\text{ Satisfy }\|x'-x\|\leq\epsilon,\ for example C(x')=t\neq y, among ∥ ⋅ ∥ \|\cdot\| Used to measure x ′ x' And x x The dissimilarity of , Usually it is l p l_p norm . Next, the main methods under this attack method are introduced .

## 2.1 Biggio

stay MNIST Generate countermeasure samples on the dataset , The attack target is the traditional machine learning classifier , Such as SVM and 3 Layer full connection neural network , And mislead the classifier by optimizing the discriminant function .
for example chart 1 in , For linear SVM, Its discriminant function g ( x ) = < w , x > + b g(x)=<w,x>+b . Suppose there is a sample x x Be correctly classified as 3. Then for this model ,biggio First generate a new sample x ′ x' , It minimizes g ( x ′ ) g(x') While maintaining ∥ x ′ − x ∥ 1 \|x'-x\|_1 Minimum . If g ( x ′ ) < 0 g(x')<0 , x ′ x' Will be misclassified . chart 1：Biggio Attack in SVM The diagram on the classifier

## 2.2 Szegedy’s limited-memory BFGS (L-BFGS)

It is applied to neural network for image classification for the first time , It seeks countermeasures by optimizing the following objectives ：
min ⁡ ∥ x − x ′ ∥ 2 2 s.t. C ( x ′ ) = t  and  x ′ ∈ [ 0 , 1 ] m . (2) \tag{2} \begin{array}{l} & \min &\|x-x'\|_2^2\qquad \text{s.t.} C(x') = t\ \text{and }x'\in[0,1]^m. \end{array}    The problem is solved approximately by introducing the loss function ：
min ⁡   λ ∥ x − x ′ ∥ 2 2 + L ( θ . x ′ , t ) , s.t.  x ′ ∈ [ 0 , 1 ] m , (3) \tag{3} \min\ \lambda\|x-x'\|_2^2+\mathcal{L}(\theta.x',t), \qquad\text{s.t. }x'\in[0,1]^m, among λ \lambda Is a scale parameter . Through adjustment λ \lambda , You can find one with x x Similar enough x ′ x' , And mislead the classifier C C .

## 2.3 Fast gradient sign method (FGSM)

Goodfellow Et al. Designed a one-step fast countermeasure sample generation method ：
x ′ = x + ϵ  sign ( ∇ x L ( θ , x , y ) ) , Not a goal x ′ = x − ϵ  sign ( ∇ x L ( θ , x , t ) ) , The goal is t (4) \tag{4} \begin{aligned} &x'=x+\epsilon\text{ sign}(\nabla_x\mathcal{L}(\theta,x,y)),\qquad\text{ Not a goal }\\ &x'=x-\epsilon\text{ sign}(\nabla_x\mathcal{L}(\theta,x,t)),\qquad\text{ The goal is }t \end{aligned}    stay Target attack Under design , The problem can be solved by one-step gradient descent ：
min ⁡ L ( θ , x ′ , t ) s.t.  ∥ x ′ − x ∥ ∞  and  x ′ ∈ [ 0 , 1 ] m . (5) \tag{5} \min\mathcal{L}(\theta,x',t)\qquad\text{s.t. }\|x'-x\|_\infty\text{ and }x'\in[0,1]^m.   FGSM One reason for this is that it requires only one back propagation , therefore It is suitable for generating a large number of confrontation samples , Its presence ImageNet Applications on, such as chart 2. chart 2： Just a few perturbations , Panda map will be misjudged

## 2.4 DeepFool

Study classifiers F F Decision boundaries around data points , Try to find a path that goes beyond the boundaries of decision-making , Such as chart 3, So as to misclassify the sample points x x . for example , The misjudgment category is 4 The sample of x 0 x_0 To category 3, The decision boundary can be described as F 3 = { z : F ( x ) 4 − F ( x ) 3 = 0 } \mathcal{F}_3=\{ z:F(x)_4 - F(x)_3 = 0 \} . Make f ( x ) = F ( x ) 4 − F ( x ) 3 f(x)=F(x)_4 - F(x)_3 , In every attack , It will use Taylor expansion F 3 ′ = { x : f ( x ) ≈ f ( x 0 ) + < ∇ x f ( x 0 ) − ( x − x 0 ) > = 0 } \mathcal{F}_3'=\{ x:f(x)\approx f(x_0) + < \nabla_xf(x_0)-(x-x_0)>=0 \} To linearize the decision hyperplane , And calculate ω 0 \omega_0 To the hyperplane F 3 ′ \mathcal{F}_3' Orthogonal vector of ω \omega . vector ω \omega It can be used as a disturbance to make x 0 x_0 Free from hyperplane . By moving ω \omega , The algorithm will find that can be classified as 3 A counter sample of x 0 ′ x_0' . chart 3： Decision boundaries

DeepFool The experimental results show that , For general DNN Image Classifier , All the test samples are very close to the decision boundary . for example LeNet stay MNIST After training on the dataset , Just a little disturbance , exceed 90% All samples will be misclassified , This surface DNN The classifier is not robust to disturbances .

## 2.5 Jacobian-based saliency map attack (JSMA)

JSMA This paper introduces a scoring function based on calculation F F The method of Jacobian matrix , It iteratively operates the pixels that have the greatest impact on the model output , It can be regarded as a greedy attack algorithm .
In particular , The author uses Jacobian matrix J F ( x ) = ∂ F ( x ) ∂ x = { ∂ F j ( x ) ∂ x i } i × j \mathcal{J}_F(x)=\frac{\partial F(x)}{\partial x}=\left\{ \frac{\partial F_j(x)}{\partial x_i} \right\}_{i\times j} Come on F ( x ) F(x) Respond to x x Change modeling when changing . Under the target attack setting , The attacker tried to misclassify the sample as t t . therefore ,JSMA Repeatedly search and operate such pixels , Its increase / Reduction will lead to F t ( x ) F_t(x) increase / Reduce ∑ j ≠ t F j ( x ) \sum_{j\neq t} F_j(x) . The final classifier will be in the category t t Give up x x Higher scores .

## 2.6 Basic iterative method (BIM) / Projected gradient descent (PGD) attack

The method is FGSM The iterative version of , Under non target attack , Will iteratively generate x ′ x'
x 0 = x ; x t + 1 = C l i p x , ϵ ( x t + α  sign ( ∇ x L ( θ , x t , y ) ) ) (6) \tag{6} x_0=x; x^{t+1}=Clip_{x,\epsilon}(x^t+\alpha\text{ sign}(\nabla_x\mathcal{L}(\theta,x^t,y)))    there C l i p Clip Indicates that the received content is projected to x x Of ϵ \epsilon Neighborhood hypersphere B ϵ ( x ) : { x ′ : ∥ x ′ − x ∥ ∞ ≤ ϵ } B_\epsilon(x):\{ x':\|x'-x\|_\infty\leq \epsilon \} Function of . step α \alpha Usually set to a fairly small value , For example, make each pixel change only one unit at a time , The number of steps is used to ensure that the disturbance can reach the boundary , for example s t e p = ϵ a l p h a + 10 step=\frac{\epsilon}{alpha}+10 . If x x It's randomly initialized , The algorithm can also be called PGD.
BIM Enlightening to the sample x x Neighborhood l ∞ l_\infty Search for the sample with the largest loss x ′ x' , Such samples are also called “ The most confrontational ” sample ： When the disturbance intensity is limited , Such samples are the most aggressive , It is most likely to fool the classifier . Finding such countermeasure samples will help to detect the defects of deep learning model .

## 2.7 Carlini & Wagner′s attack (C&W′s attack)

C&W′s attack Used against in FGSM and L-BFGS Defense strategy on , The goal is to solve L-BFGS Minimum distortion disturbance defined in . Use the following strategy to approximate The formula 2
min ⁡ ∥ x − x ′ ∥ 2 2 + c ⋅ f ( x ′ , t ) , s.t.  x ′ ∈ [ 0 , 1 ] m , (7) \tag{7} \min \|x-x'\|_2^2+c\cdot f(x',t),\qquad\text{s.t. }x'\in[0,1]^m, among f ( x ′ , t ) = ( max ⁡ i = t Z ( x ′ ) i − Z ( x ′ ) t ) + f(x',t)=(\max_{i=t}Z(x')_i-Z(x')_t)^+ , Z ( ⋅ ) Z(\cdot) Used to get softmax Network layer input before . By minimizing f ( x ′ , t ) f(x',t) You can find one in the category t t The score is much higher than that of other classes x ′ x' . Next, use linear search , Will find a place away from x x Current x ′ x' .
function f ( x , y ) f(x,y) It can be seen as about data ( x , y ) (x,y) Loss function of ： You can punish some labels i i Score of Z ( x ) i > Z ( x ) y Z(x)_i>Z(x)_y The situation of .C&W’s attack And L-BFGS The only difference is that the former uses f ( x , t ) f(x,t) Instead of the cross entropy of the latter L ( x , t ) \mathcal{L}(x,t) . The advantage of this is , When the classifier outputs C ( x ′ ) = t C(x')=t when , Loss f ( x ′ , t ) = 0 f(x',t)=0 , The algorithm will directly minimize x ′ x' To x x Distance of .
The authors claim that their method is one of the strongest attack strategies , It defeated many means of counterattack . therefore , This method can be used as DNN The reference point of safety detection , Or used to evaluate the quality of the countermeasure sample .

## 2.8 Ground truth attack

A tit for tat attack , To break the deadlock ,Carlini And others are trying to find the strongest attack , It is used to find the theoretical minimum distortion countermeasure sample . The attack method is based on an algorithm used to verify the characteristics of neural network , It will model parameters F F And data ( x , y ) (x,y) Coding is the subject of a class of linear programming systems , And by examining the sample x x The neighborhood of B ϵ ( x ) B_\epsilon(x) Whether there is a sample that can mislead the classifier x ′ x' To handle the system . By narrowing the neighborhood until it doesn't exist x ′ x' , Well, due to the last search x ′ x' And x x There is minimal dissimilarity between , At this time x ′ x' It is called Basic facts against samples (ground truth adversarial example).
Ground truth attack It is the first time to seriously the robustness of accurate classifier . However , This method uses Module theory of satisfiability (satisfiability modulo theories, SMT) solver ( A complex algorithm for checking the satisfiability of a series of theories ), This will make it slow and unable to scale to large networks . Then there is work to improve its efficiency .

## 2.9 other l p l_p attack

2.1–2.8 The main focus of the attack is l 2 l_2 or l ∞ l_\infty Perturbations under constraints , Here are some other ：
1）One-pixel attack： And L-BFGS The difference is that the constraint uses l 0 l_0 , The advantage is that you can limit the number of pixels allowed to change . The job shows , stay CIFAR10 On dataset , Just changing one pixel can make the training good CNN The classifier predicts more than half of the samples ;
2）Elastic-net attack (ENA)： And L-BFGS The difference is to use l 1 l_1 and l 2 l_2 Norm to constrain .

## 2.10 Global attack (universal attack)

2.1–2.9 The method is only for a specific sample x x The attack . This attack aims to mislead the classifier's results on all test sets , It attempts to find disturbances that meet the following conditions δ \delta
1） ∥ δ ∥ p ≤ ϵ \|\delta\|_p\leq\epsilon ;
2） R x ∼ D ( x ) ( C ( x + δ ) ≠ C ( x ) ) ≤ 1 − σ \mathbb{R}_{x\sim D(x)}(C(x+\delta)\neq C(x))\leq1-\sigma .
In the corresponding experiment , Successfully found a disturbance δ \delta , bring ResNet152 The network ILSVRC 2012 On dataset 85.4 % 85.4\% Our sample was attacked .

## 2.11 Space conversion attack (spatially transformed attack)

The traditional adversarial attack algorithm directly modifies the pixels in the image , This will change the color intensity of the image . Spatial transformation attack attacks by adding some spatial disturbances to the image , Translation distortion including local image features 、 rotate , And twist . Such disturbance is enough to avoid manual detection , Can also deceive the classifier , Such as chart 4. chart 4： Space conversion attack

## 2.12 Unconstrained countermeasure sample

2.1–11 All of our work adds unobtrusive disturbances to the image , This work generates some unconstrained confrontation samples ： These samples need not look similar to the victim image , But an image that can fool the classifier and is legal in the eyes of the observer .
To attack the classifier C C , Enhanced class confrontation generation network (AC-GAN) G \mathcal{G} First of all, based on c c Noise like vector z 0 z^0 Generate a legal sample x x . Then find one close to z 0 z^0 Noise vector z z , It makes G ( z ) \mathcal{G}(z) Can mislead C C . because z z In potential space with z 0 z^0 be similar , Output G ( z ) \mathcal{G}(z) Still have labels y y , So as to achieve the purpose of attack .

# 3 Physical world attack

chapter 2 All attack methods in are applied in digital form , The attacked party directly provides the input image to the machine learning model . However , In some cases, this is not always the case , For example, using a camera 、 The situation of receiving signals from sensors or other sensors . In this case, still attack these systems by generating physical world confrontation objects ？ Such an attack exists , For example, stick stickers on road signs , This will seriously threaten the sign recognizer of autonomous vehicle . Such antagonistic objects are more destructive to the deep learning model , Because they can directly challenge DNN Many practical applications of , For example, face recognition 、 Autopilot, etc .

## 3.1 Exploration of confrontation samples in the physical world

For example, by checking the generated countermeasure image (FGSM、BIM) In natural transformation ( Such as changing the viewpoint 、 Light, etc ) Check whether “ steady ” To explore the feasibility of making physical confrontation objects . ad locum ,“ robust ” It means that the produced image is still antagonistic after conversion . To apply this transformation , First, print out the carefully made image , The test subjects were asked to take photos of these printouts with their mobile phones . In the process , The shooting angle or lighting environment is not limited , Therefore, the obtained photo is a sample converted from the previously generated countermeasure sample . Experimental results show that , After conversion , A large part of these confrontation samples , In especial FGSM Generated samples , Still against the classifier . These results show that the possibility of physical confrontation object can deceive the sensor in different environments .

## 3.2 Of road signs Eykholt attack

chart 5 in , Fool the signal recognizer by pasting tape at the appropriate position of the signal sign . The author's attacks include ：
1） The base On l 1 be based on l_1 The norm attack is used to roughly locate the disturbed region , Adhesive tape will be affixed behind these areas ;
2） In the rough positioning area , Using a l 2 l_2 The attack of norm generates the color of tape ;
3） Paste the tape of the specified color in the specified area . Such attacks confuse the auto drive system from different angles and distances . chart 5： Stick adhesive tape on traffic signal signs

## 3.3 Athaly Of 3D Against

A successful production of Physics 3D The work of the confrontation object is shown in the figure 6 Shown . Author use 3D Print to make antagonistic turtles . In order to achieve the goal , They implemented 3D Rendering technology . A given texture band 3D object , First, optimize the texture of the object , Make rendered images antagonistic from any point of view . In the process , It also ensures that disturbances remain antagonistic in different environments ： Camera distance 、 Light conditions 、 rotate , And the background . Find 3D After rendering the disturbance , They print 3D An instance of an object . chart 6：3D Against

# 4 Black box attack

## 4.1 Model replacement

Attackers can only enter samples x x Tag information obtained after y y To carry out the attack . Besides , An attacker can have the following information available ：
1） Areas of classified data ;
2） The framework of classifier , for example CNN still RNN.
This work explores the mobility of countermeasure samples ： A sample x ′ x' If you can attack the classifier F 1 F_1 , Then it can also attack and F 1 F_1 Classifiers with similar structures F 2 F_2 . therefore , The author trained a replacement model F ′ F' To the victimization model F F To simulate , Then attack F ′ F' To generate countermeasure samples , Its Main steps as follows ：
1） Synthetic replacement training data set ： For example, in handwriting recognition task , Attackers can reproduce test samples or other handwritten data ;
2） Training replacement model ： The composite dataset X X Enter the victim model to get the tag Y Y , Then based on ( X , Y ) (X,Y) Training DNN Model F ′ F' . Attackers will be based on their own knowledge , Select one of the training models with the most similar structure to the victim model F ′ F' ;
3） Data to enhance ： Iterative enhancement ( X , Y ) (X,Y) Pay equal attention to training F ′ F' . This process will enhance the diversity of reprographic data and F ′ F' The accuracy of the ;
4） Attack replacement model ： Use existing methods such as FGSM To attack F ′ F' , The generated countermeasure samples will be used to play F F
You should choose how to attack F ′ F' ？ A successful replacement model black box attack should be portable , Therefore, we choose attack methods with high mobility, such as FGSM、PGD, And momentum iteration attack .

## 4.2 ZOO： Black box attack based on zero order optimization

This method assumes The prediction confidence can be obtained from the classifier , In this case, there is no need to establish replacement data sets and replacement models .Chen By adjusting x x To observe F ( x ) F(x) Change in confidence , In order to get x x Relevant gradient information . Such as The formula 8 Shown , By introducing a disturbance small enough h h , We can push the gradient information by outputting information ：
∂ F ( x ) ∂ x i ≈ F ( x + h e i ) − F ( x − h e i ) 2 h . (8) \tag{8} \frac{\partial F(x)}{\partial x_i}\approx\frac{F(x+he_i)-F(x-he_i)}{2h}.   ZOO What is more successful than the replacement model is that more prediction information can be used .

## 4.3 Efficient query black box attack

4.1-2 The method in requires multiple queries of the output information of the model , This is prohibited in some applications . therefore Improve the generation efficiency of black box attack countermeasure samples within a limited number of times It's necessary . For example, natural evolution strategy is introduced to obtain gradient information efficiently , It is based on x x Sample the query results , Then evaluate F F The gradient is in x x On the expectations of . Besides , They use genetic algorithm to search the neighborhood of the victim image for the countermeasure sample .

# 5 Grey box attack

The strategy of gray box attack , for example , Firstly, a model of interest is trained GAN, Then the countermeasure samples are generated directly based on the countermeasure generation network . The author believes that based on GAN The attack method can speed up the generation of countermeasure samples , And can get more natural and imperceptible images . Then this strategy is also used in the intrusion of face recognition system .

# 6 Poison attack

The existing discussions are carried out after classifier training , Poisoning attack generates countermeasure samples before training ： Generate some countermeasure samples and embed them into the training set , So as to reduce the overall accuracy of the classification model or affect the samples of specific categories . Usually , The attacker under this setting has the model structure for training poisoning data . Poisoning attacks are usually used in graph neural networks , This is because it requires specific graph knowledge .

## 6.1 Biggio stay SVM Poisoning attack on

Find such a sample x c x_c , After it is mixed with training data , Will lead to learned SVM Model F x c F_{x_c} There is a big loss on the validation set . Such an attack method is right for SVM It worked , However, for deep learning , Finding such a sample is difficult .

## 6.2 Koh The model explains

Koh and Liang A neural network interpretation method is introduced ： If the training sample changes , How the prediction results of the model will change ？ When only one training sample is modified , Their model can clearly quantify the change in the final loss , Without retraining the model . By finding the training samples that have a great impact on the model prediction , This work can naturally be used for poisoning attacks .

## 6.3 Poisonous frog (poison frogs)

Poison frog mixes a confrontation image with a real label in the training set , So as to achieve the purpose of wrong prediction test set . Give a tag as y t y_t Target test sample x t x_t , The attacker first uses the tag y b y_b Benchmark sample of x b x_b , And find... Through the following optimization x ′ x'
x ′ = arg min ⁡ x ∥ Z ( x ) − Z ( x t ) ∥ 2 2 + β ∥ x − x b ∥ 2 2 (9) \tag{9} x'=\argmin_x\|Z(x)-Z(x_t)\|_2^2+\beta\|x-x_b\|_2^2    because x ′ x' And x b x_b lately , Based on the training set X t r a i n + { x } ′ X_{train}+\{x\}' The training model will put x ′ x' Forecast as y b y_b . Use the new model to predict x t x_t , The optimization goal will be forced closer x t x_t And x ′ x' The predicted score of , the x t x_t Forecast as y b y_b .

# reference

https://chowdera.com/2022/174/202206231638581004.html