当前位置:网站首页>Deep learning autoencoder and its application in image denoising

Deep learning autoencoder and its application in image denoising

2020-12-07 14:00:22 osc_ z9jr2tjo

introduction :“ Self encoder ”(Autoencoder) It's an unsupervised learning method ( A more accurate term should be self supervision ), It is mainly used for data dimension reduction or feature extraction . It's a little bit like PCA、 Dictionary learning , Or compressed sensing . Data dimensionality reduction here , It can also be understood as data compression , In short, it is to produce a low dimensional representation of high-dimensional raw data , And this low dimensional expression is required to keep the important information in the original data as much as possible . It can be compared with PCA The principal component extracted from the . but Autoencoder The compression function of has three features :1) This compression is damaging ;2) This compression or feature extraction is for specific data ;3) This compression requires automatic learning from data . in addition , In almost all use of terms “ Self encoder ” In the context of , data compression ( Includes decompression ) Or feature extraction is realized by neural network .

 

One 、 Theoretical model

More specifically , We have some raw data ( It's usually high dimensional ), Such as images . And then use a neural network to compress it . Such neural networks are also known as “ Encoder ”(Encoder), The original data is processed by the encoder to obtain a compressed low dimensional representation , As shown in the figure below .

Corresponding to the encoder , also “ decoder ”(Decoder), As shown in the figure below . The decoder is also a neural network , It expresses the low dimension ( That is, the compressed information ) Restore the restored information to the same dimension as the original information . Because it's lossy compression , So the restored information here is not exactly the same as the original information . And what we expect is that the gap between the two is as small as possible .

Last , Integrate the encoder and decoder , You get the complete Autoencoder frame . in fact , In the neural network training stage , The encoder and decoder are built at the same time . This point , In the following example , You will see .

hypothesis , The raw data we're dealing with is MNIST An image in a database . You can know , The dimensions of the raw data are 784. After self encoder compression , Data can be reduced to dozens or even lower . As shown in the figure below , The restored image is still recognizable . Most of the main information in the image is preserved .

But as the opening introduction says , This kind of compression based on self encoder has several characteristics , This is different from the traditional sense of compression :

  1. Self encoder is for specific data , This means that they can only compress data that is similar to the data in the training set . for instance , Conventional JPEG Image compression algorithm can be applied to any image , Not a specific type of image . But if we use it for training Autoencoder The data set of is MNIST, Then the Autoencoder It can only be used to produce low dimensional expressions for such handwritten numbers . therefore , If the self encoder trained on the face image is used to compress the images of flowers, plants and trees , So it can be imagined that its performance is quite poor .
  2. Self encoder is lossy , This means that compared to the original input , Compared with the original data, the quality of the output after decompression will be degraded ( Be similar to MP3 or JPEG Compress ). This is the same as lossless compression ( for example LZW Coding algorithm ) Different .
  3. Self coders learn automatically from data instances , This is a useful attribute : This means that a model trained with a particular data set will perform well on a particular type of input . And this process doesn't require complex human intervention , The following is based on MNIST An example of this will demonstrate this .

To build a self encoder , You need three things : Encoder , decoder , And a distance function to measure the amount of information loss between the original data and the recovered data after decompression ( namely “ Loss ” function ). Encoders and decoders are usually neural networks , And it's differentiable relative to the distance function , So we can use optimization algorithms such as stochastic gradient descent to get the coding / Parameters of decoder , To minimize the cost of reconstruction .

Two 、 A problem

We use the word compression over and over again , The reader may ask Autoencoder Good at data compression ? Or say , It can be used to replace traditional ones JPEG perhaps MP3 Algorithm ? Usually the answer is No . for example , In image compression , It's hard to train more than JPEG And other basic algorithms to do better self encoder . and ,Autoencoder Where it comes in handy , Usually the images that need to be processed are very specific image types ( And it's JPEG When the effect is obviously not good ). The fact that self encoders need to target specific data makes them impractical for real-world data compression problems , This is what we have repeatedly emphasized , You can only use them on data similar to training data . To make it more versatile , It may take a lot of training data . Of course , Maybe in the future, with the progress of Technology , There may be improvement on this point , At the moment, it's still a limitation of it .

3、 ... and 、 The advantages of self encoder

Two interesting practical applications of self encoder are data denoising ( We will demonstrate it in the examples later in this article ) And dimensionality reduction for data visualization . With proper size and sparsity constraints , From the encoder can learn more than PCA Or other basic techniques more interesting data mapping . Especially for 2D visualization ,t-SNE It's probably the best algorithm known , But it usually requires relatively low dimensional data . therefore , A good strategy for visualizing similarity relations in high dimensional data is to start using self encoder to compress data into low dimensional space ( for example 32 dimension ), And then use t-SNE Map compressed data to 2D Plane .

Self encoder is not really unsupervised learning technology ( This will mean a completely different learning process ), Although it doesn't need to be trained “ label ”, But more precisely , They're a self-monitoring Technology , Is a specific instance of supervised learning that generates goals from input data . In order for self-monitoring models to learn meaningful functions , You have to come up with a meaningful composite objective and loss function , This is the problem : Learning to reconstruct input carefully may not be the right choice ( Or not the best choice ). Just imagine , While reconstructing the input accurately , The main tasks that are more interesting to us ( For example, classification or positioning ) To achieve higher performance or performance on , Not better . Further more , Even if we lose the former ( That is, refactoring is not performing well ), But the extracted features make us interested in the main task ( For example, classification or positioning ) There's a big breakthrough , Maybe that's what we're after .

Four 、 stay MNIST Based on Keras Examples of completion

Let's say MNIST Data sets, for example , To demonstrate Autoencoder The construction of . stay Keras Implemented on Autoencoder It's very easy . First , Or import the necessary packages and datasets .

import keras
from keras.layers import Input, Dense
from keras.models import Model
from keras import regularizers

import matplotlib.pyplot as plt
%matplotlib inline

from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

As a simple start , We use the simplest single-layer fully connected neural network as the editor / decoder . Optimizer selection Adadelta, Loss function selection per-pixel binary crossentropy loss. Through the encoder , original 784 Dimension data is compressed into a 32 Dimension data .

# this is the size of our encoded representations
encoding_dim = 32  # 32 floats -> compression of factor 24.5, assuming the input is 784 floats

# this is our input placeholder
input_img = Input(shape=(784,))
# "encoded" is the encoded representation of the input
encoded = Dense(encoding_dim, activation='relu')(input_img)
# "decoded" is the lossy reconstruction of the input
decoded = Dense(784, activation='sigmoid')(encoded)

# this model maps an input to its reconstruction
autoencoder = Model(input_img, decoded)

# this model maps an input to its encoded representation
encoder = Model(input_img, encoded)

# create a placeholder for an encoded (32-dimensional) input
encoded_input = Input(shape=(encoding_dim,))
# retrieve the last layer of the autoencoder model
decoder_layer = autoencoder.layers[-1]
# create the decoder model
decoder = Model(encoded_input, decoder_layer(encoded_input))

autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

Training models 50 individual epochs.

autoencoder.fit(x_train, x_train,
                epochs=50,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))

Last , Let's show you the original image and the corresponding restored image .

encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)

n = 10  # how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
    # display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

As shown in the figure below , It can be seen that the restored image keeps the most important information in the original image , meanwhile , Compared to the original image , The quality of the restored image is degraded .

In the previous example , The low dimensional representation of data is limited to only one 32 Dimension vector , in other words , We only require the low dimensional representation of data from the perspective of vector dimension . The usual result of this is , The hidden layer of neural network only learns one with PCA Similar results . besides , We can also add new restrictions to require the low dimensional representation of data to meet the sparsity requirements . stay Keras in , We are right. Dense Layer add a activity_regularizer The above requirements can be achieved .

# add a Dense layer with a L1 activity regularizer
encoded = Dense(encoding_dim, activation='relu',
                activity_regularizer=regularizers.l1(10e-8))(input_img)

Be careful , stay Keras The official tutorial in 【2】 in , The parameters used in the above code are 10e-5, But after testing, the convergence rate is very slow , So it is suggested to change it to 10e-8. Even so , Still need to increase model training epochs to 100 individual . My personal test results show that , Training loss And verification loss Will converge to 0.101 and 0.996 about . The contrast shown in the figure below , The results show that there is no significant difference between the results of image restoration with sparsity restriction and the results given before . But the sparsity of low dimensional representation is increased , Readers can verify this for themselves .

Further more , You can also add Dense The number of layers , For example, in the construction of editing / Decoder , Replace the corresponding code in the original program with the following code :

# this is the size of our encoded representations
encoding_dim = 32  # 32 floats -> compression of factor 24.5, assuming the input is 784 floats

# this is our input placeholder
input_img = Input(shape=(784,))
encoded = Dense(128, activation='relu')(input_img)
encoded = Dense(64, activation='relu')(encoded)
encoder_output = Dense(encoding_dim, activation='relu')(encoded)

decoded = Dense(64, activation='relu')(encoder_output)
decoded = Dense(128, activation='relu')(decoded)
decoded = Dense(784, activation='sigmoid')(decoded)

# this model maps an input to its reconstruction
autoencoder = Model(input_img, decoded)

# this model maps an input to its encoded representation
encoder = Model(input_img, encoder_output)

# autoencoder.summary()

# create a placeholder for an encoded (32-dimensional) input
encoded_input = Input(shape=(encoding_dim,))

# create the decoder model
deco = autoencoder.layers[-3](encoded_input)
deco = autoencoder.layers[-2](deco)
deco = autoencoder.layers[-1](deco)
# create the decoder model
decoder = Model(encoded_input, deco)

#decoder.summary()

autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

autoencoder.fit(x_train, x_train,
                epochs=100,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))

It should be noted that decoder Implementation part of , Original code works for single-layer because only last layer is decoder, for 3-layer encoders and decoders, you have to call all 3 layers for defining decoder. The reconstruction results of the output below show that the quality of the restored image is improved by increasing the number of hidden layers in the network .

Notice that we're dealing with image data here , We've been using fully connected neural networks (Fully-connected NN), But in practice, convolution neural network is used in image processing (CNN). So the following , We continue to rewrite the previous program , Turn to CNN To build Autoencoder. First , Or read in the necessary package And data sets . Because we will use CNN, So the data dimension transformation part here will be different from the previous code .

from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import backend as K

import matplotlib.pyplot as plt
%matplotlib inline

from keras.datasets import mnist
import numpy as np

(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))  # adapt this if using `channels_first` image data format

Here's part of the neural network architecture . Pay attention to building decoder When , We used UpSampling2D, The upper sampling function . The parameters are (2,2) The row magnification is given respectively ( Here take 2 It means that the original line has become two lines , It's just a line of thick , It's two lines thick ) And column magnification ( Here take 2 It means that the original column has become two lines , It's just a row that thick , It's two columns thick ). therefore ,(2,2) In fact, it is equivalent to enlarging the original picture by four times ( Twice the level , Twice as vertical ), for example 32*32 become 62*64 Image .

input_img = Input(shape=(28, 28, 1))  # adapt this if using `channels_first` image data format

x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

# at this point the representation is (4, 4, 8) i.e. 128-dimensional

x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

# this model maps an input to its encoded representation
encoder = Model(input_img, encoded)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

train_history = autoencoder.fit(x_train, x_train,
                epochs=100,
                batch_size=128,
                shuffle=True,
                validation_data=(x_test, x_test))

In the training 100 individual epochs after , Let's look at the contrast between the reconstructed result and the original image .

decoded_imgs = autoencoder.predict(x_test)

n = 10
plt.figure(figsize=(20, 4))
for i in range(1, 10):
    # display original
    ax = plt.subplot(2, n, i)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # display reconstruction
    ax = plt.subplot(2, n, i + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

As shown in the figure below , It can be seen that CNN Of autoencoder You can get higher quality restored images .

It can also visually show the model in the training history on the training set and verification set loss The change of .

plt.plot(train_history.history['loss'])
plt.plot(train_history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper right')
plt.show()

As shown in the figure below , Finally, the model is on the training set and the verification set loss It will converge to 0.1 following .

If you're interested , You can also encode the image (encoded representations) Draw it out . It has been pointed out in the previous code that , The dimension of this expression is (4,4,8), That's the total 128 dimension . In order to draw it in the form of images , Here it's transformed into 4x32 And do the gray image display .

encoded_imgs = encoder.predict(x_test)

n = 10
plt.figure(figsize=(10, 4))
for i in range(n):
    ax = plt.subplot(1, n, i+1)
    plt.imshow(encoded_imgs[i].reshape(4, 4 * 8).T)
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

The results are shown in the following figure .

Last , Let's see Autoencoder The amazing noise reduction ability of . First read in the data , And introduce Gaussian noise .

from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import backend as K

import matplotlib.pyplot as plt
%matplotlib inline

from keras.datasets import mnist
import numpy as np

(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))  # adapt this if using `channels_first` image data format

noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape) 
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape) 

x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)

You can draw images with noise .

n = 10
plt.figure(figsize=(20, 2))
for i in range(n):
    ax = plt.subplot(1, n, i+1)
    plt.imshow(x_test_noisy[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

so , The noise we added has a great influence on the image , The quality of the image has been seriously degraded .

For better results , Slightly modify the previous code . Be careful , We still use CNN Version of Autoencoder, But each convolution layer has more filters.

input_img = Input(shape=(28, 28, 1))  # adapt this if using `channels_first` image data format

x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

# at this point the representation is (7, 7, 32)

x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

train_history = autoencoder.fit(x_train_noisy, x_train,
                epochs=100,
                batch_size=128,
                shuffle=True,
                validation_data=(x_test_noisy, x_test))

because Autoencoder There is no noise MNIST Trained on the data set , So it will learn more about “ clean ” Handwritten digital image features of . When using the trained model for image reconstruction , Input noisy image ,Autoencoder They will try to restore them to be similar to the training data set ( Because this is the knowledge that neural networks have mastered and are familiar with ). therefore , It plays the role of denoising .

decoded_imgs = autoencoder.predict(x_test_noisy)

n = 10
plt.figure(figsize=(20, 2))
for i in range(n):
    ax = plt.subplot(1, n, i+1)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

The image below is the effect picture after denoising . Be careful , This is the traditional way to get rid of irritability ( Gaussian blur 、 median filtering 、 Wavelet transform 、JPEG Denoise ) Can't reach the effect ! however , however , Be sure to pay attention to , What we discussed at the beginning Autoencoder The limitation of , That is, it is data-specific Of , That is, it can only be used for specific types of data . If you use this Autoencoder Let's deal with a noisy image of flowers and plants , It doesn't make any sense .

Finally, I need to add that ,Autoencoder There are many new developments , For example, based on LSTM Self encoder of , Interested readers may as well refer to the relevant literature to learn more about .

 

* All the complete code in this article Jupyter Notebook Documents can be obtained from 【 Cloud link 】  Download it ( Extraction code : wvni)


 

reference

【1】How to reduce image noises by autoencoder ( In this article, the pictures are quoted from )

【2】Building Autoencoders in Keras ( Part of the sample pictures and code in this article are quoted from the article , But the code execution in the example is wrong , The author has revised )

【3】 Dictionary learning and image denoising practice

 

版权声明
本文为[osc_ z9jr2tjo]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/12/20201207135842295d.html