Introduction to twin network (2) Siamese net classified clothing MNIST data set (Python)

2020-12-07 16:16:09

Contribution theme ：https://github.com/xitu/juejin-markdown-themes

theme: smartblue highlight:

In the last article, I explained Siamese Net Principle , And the key to this network architecture —— Loss function contrastive loss. Now let's use pytorch To make a simple case . After this case , My personal gains are as follows ：

• Siamese Net Suitable for small datasets ;
• at present Siamese Net Used in categorizing tasks （ If a friend knows how to use it in segmentation or other tasks, you can send me a message ,WX：cyx645016617）
• Siamese Net The results show that the method has good interpretability .

1 Prepare the data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn.model_selection import train_test_split
device = 'cuda' if torch.cuda.is_available() else 'cpu' This data file is csv Format , The first column is the category , After that 784 Actually, it's like 28x28 Pixel value .

Divide training set and verification set , And convert the data into 28x28 Pictures of the

X_full = data_train.iloc[:,1:]
y_full = data_train.iloc[:,:1]
x_train, x_test, y_train, y_test = train_test_split(X_full, y_full, test_size = 0.05)
x_train = x_train.values.reshape(-1, 28, 28, 1).astype('float32') / 255.
x_test = x_test.values.reshape(-1, 28, 28, 1).astype('float32') / 255.
y_train.label.unique()
>>> array([8, 9, 7, 6, 4, 2, 3, 1, 5, 0])

You can see this Fashion MNIST Data set also follows MNIST similar , Divided 10 Different categories .

• 0 T-shirt/top
• 1 Trouser
• 2 Pullover
• 3 Dress
• 4 Coat
• 5 Sandal
• 6 Shirt
• 7 Sneaker
• 8 Bag
• 9 Ankle boot
np.bincount(y_train.label.values),np.bincount(y_test.label.values)
>>> (array([4230, 4195, 4135, 4218, 4174, 4172, 4193, 4250, 4238, 4195]),
array([1770, 1805, 1865, 1782, 1826, 1828, 1807, 1750, 1762, 1805]))

You can see , The data of each category is still very balanced .

2 structure Dataset And Visualization

class mydataset(Dataset):
def __init__(self,x_data,y_data):
self.x_data = x_data
self.y_data = y_data.label.values
def __len__(self):
return len(self.x_data)
def __getitem__(self,idx):
img1 = self.x_data[idx]
y1 = self.y_data[idx]
if np.random.rand() < 0.5:
idx2 = np.random.choice(np.arange(len(self.y_data))[self.y_data==y1],1)
else:
idx2 = np.random.choice(np.arange(len(self.y_data))[self.y_data!=y1],1)
img2 = self.x_data[idx2]
y2 = self.y_data[idx2]
label = 0 if y1==y2 else 1
return img1,img2,label

About torch.utils.data.Dataset Construction structure of , I won't go back to , Prior to 《 Xiaobaixue PyTorch》 It's been explained clearly in the series . The logic is , Given a idx, Then we judge first , This data is to find two pictures of the same category or two pictures of different categories .50% The probability of selecting two pictures of the same category , And then at the end of the output , Output these two pictures , And then output one more label, This label by 0 The category of the two pictures is the same ,1 Indicates that the categories of two pictures are different . In this way, the model can be trained and the loss function can be calculated .

train_dataset = mydataset(x_train,y_train)
val_dataset = mydataset(x_test,y_test)
fig, axs = plt.subplots(2, img1.shape, figsize = (12, 6))
for idx,(ax1,ax2) in enumerate(axs.T):
ax1.imshow(img1[idx,:,:,0].numpy(),cmap='gray')
ax1.set_title('image A')
ax2.imshow(img2[idx,:,:,0].numpy(),cmap='gray')
ax2.set_title('{}'.format('same' if target[idx]==0 else 'different'))
break

The code in this section is for a batch A visualization of the data ： There should be no problem with the current position , If you have any questions, please contact me for discussion and exchange ,WX：cyx645016617. I personally think that communication can solve problems and make progress quickly .

3 Build the model

class siamese(nn.Module):
def __init__(self,z_dimensions=2):
super(siamese,self).__init__()
self.feature_net = nn.Sequential(
nn.ReLU(inplace=True),
nn.BatchNorm2d(4),
nn.ReLU(inplace=True),
nn.BatchNorm2d(4),
nn.MaxPool2d(2),
nn.ReLU(inplace=True),
nn.BatchNorm2d(8),
nn.ReLU(inplace=True),
nn.BatchNorm2d(8),
nn.MaxPool2d(2),
nn.ReLU(inplace=True)
)
self.linear = nn.Linear(49,z_dimensions)
def forward(self,x):
x = self.feature_net(x)
x = x.view(x.shape,-1)
x = self.linear(x)
return x

A very simple convolution network , The dimension of the output vector is z-dimensions Size .

def contrastive_loss(pred1,pred2,target):
MARGIN = 2
euclidean_dis = F.pairwise_distance(pred1,pred2)
target = target.view(-1)
loss = (1-target)*torch.pow(euclidean_dis,2) + target * torch.pow(torch.clamp(MARGIN-euclidean_dis,min=0),2)
return loss

And then built a contrastive loss Loss function calculation of .

4 Training

model = siamese(z_dimensions=8).to(device)
for e in range(10):
history = []
img1 = img1.to(device)
img2 = img2.to(device)
target = target.to(device)

pred1 = model(img1)
pred2 = model(img2)
loss = contrastive_loss(pred1,pred2,target)

loss.backward()
optimizor.step()

loss = loss.detach().cpu().numpy()
history.append(loss)
train_loss = np.mean(history)
history = []
img1 = img1.to(device)
img2 = img2.to(device)
target = target.to(device)

pred1 = model(img1)
pred2 = model(img2)
loss = contrastive_loss(pred1,pred2,target)

loss = loss.detach().cpu().numpy()
history.append(loss)
val_loss = np.mean(history)
print(f'train_loss:{train_loss},val_loss:{val_loss}')

Here to speed up training , I put batch-size Increased to 128 individual , The rest has not changed ： This is running 10 individual epoch Result , Don't forget to save the model ：

torch.save(model.state_dict(),'saimese.pth')

It's almost like this , Then take a look at the visualization of the validation set , What we use here is t-sne Visualization method of high position feature , Its core is PCA Dimension reduction ：

from sklearn import manifold
'''X Is the characteristic , It doesn't contain target;X_tsne It's a feature after dimensionality reduction '''
tsne = manifold.TSNE(n_components=2, init='pca', random_state=501)
X_tsne = tsne.fit_transform(X)
print("Org data dimension is {}. \
Embedded data dimension is {}".format(X.shape[-1], X_tsne.shape[-1]))

x_min, x_max = X_tsne.min(0), X_tsne.max(0)
X_norm = (X_tsne - x_min) / (x_max - x_min)  #  normalization
plt.figure(figsize=(8, 8))
for i in range(10):
plt.scatter(X_norm[y==i][:,0],X_norm[y==i][:,1],alpha=0.3,label=f'{i}')
plt.legend()

Input the image as ： You can see that , It is better to divide the different categories , You can see that the distance between different categories is still relatively large , Quite obvious , And even put down the name of the official account. . The implicit variable used here is 8.

Here's a question , I have the answer in my heart. I don't know what you think , If I put z The dimension of latent variable is changed into 2, So you don't need to use tsne and pca Methods to reduce the dimension can be directly visualized , But in this case, the visualization effect is not better than that of 8 Dimensionality reduction 2 The visual effect is good , Why is that ？

Tips ： On the one hand, the dimension is too small to cause the lack of information , But this explanation doesn't hold water , because PCA It is equivalent to a degenerate linear layer , therefore PCA It will also cause this kind of deficiency ; I think the key should be the calculation of Euclidean distance in the loss function , If the dimension is high , Then the European distance will be larger , This needs to be adjusted accordingly MARGIN The numerical .

https://chowdera.com/2020/12/20201207161535868g.html