Convolutional Autoencoder

By Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

Table of Contents

Source

1. 2D Convolution¶

tf.keras.layers.Conv2D(filters, kernel_size, strides, padding, activation, kernel_regularizer, input_shape)
    filters = 32
    kernel_size = (3,3)
    strides = (1,1)
    padding = 'SAME'
    activeation='relu'
    kernel_regularizer=tf.keras.regularizers.l2(0.04)
    input_shape = tensor of shape([input_h, input_w, input_ch])

filter size
- the number of channels.

kernel_size
- the height and width of the 2D convolution window.
stride
- the step size of the kernel when traversing the image.
padding
- how the border of a sample is handled.
- A padded convolution will keep the spatial output dimensions equal to the input, whereas unpadded convolutions will crop away some of the borders if the kernel is larger than 1.
- 'SAME' : enable zero padding
- 'VALID' : disable zero padding
activation
- Activation function to use.
kernel_regularizer
- Initializer for the kernel weights matrix.
input and output channels
- A convolutional layer takes a certain number of input channels ($C$) and calculates a specific number of output channels ($D$).

Examples

input = [None, 4, 4, 1]
filter size = [3, 3, 1, 1]
strides = [1, 1, 1, 1]
padding = 'VALID'

input = [None, 5, 5, 1]
filter size = [3, 3, 1, 1]
strides = [1, 1, 1, 1]
padding = 'SAME'

2. Transposed Convolution¶

The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. For instance, one might use such a transformation as the decoding layer of a convolutional autoencoder or to project feature maps to a higher-dimensional space.

Some sources use the name deconvolution, which is inappropriate because it’s not a deconvolution. To make things worse deconvolutions do exists, but they’re not common in the field of deep learning.
An actual deconvolution reverts the process of a convolution.
Imagine inputting an image into a single convolutional layer. Now take the output, throw it into a black box and out comes your original image again. This black box does a deconvolution. It is the mathematical inverse of what a convolutional layer does.

A transposed convolution is somewhat similar because it produces the same spatial resolution a hypothetical deconvolutional layer would. However, the actual mathematical operation that’s being performed on the values is different.
A transposed convolutional layer carries out a regular convolution but reverts its spatial transformation.

tf.keras.layers.Conv2DTranspose(filters, kernel_size, strides, padding = 'SAME', activation)
    filter = number of channels/ 64
    kernel_size = tensor of shape (3,3)
    strides = stride of the sliding window for each dimension of the input tensor
    padding = 'SAME'
    activation = activation functions('softmax', 'relu' ...)

'SAME' : enable zero padding
'VALID' : disable zero padding

An image of 5x5 is fed into a convolutional layer. The stride is set to 2, the padding is deactivated and the kernel is 3x3. This results in a 2x2 image.

2D convolution with no padding, no stride and kernel of 3

If we wanted to reverse this process, we’d need the inverse mathematical operation so that 9 values are generated from each pixel we input. Afterward, we traverse the output image with a stride of 2. This would be a deconvolution.

A transposed convolution does not do that. The only thing in common is it guarantees that the output will be a 5x5 image as well, while still performing a normal convolution operation. To achieve this, we need to perform some fancy padding on the input.

Transposed 2D convolution with no padding, stride of 2 and kernel of 3

It merely reconstructs the spatial resolution from before and performs a convolution. This may not be the mathematical inverse, but for Encoder-Decoder architectures, it’s still very helpful. This way we can combine the upscaling of an image with a convolution, instead of doing two separate processes.

Another example of transposed convolution

Transposed 2D convolution with no padding, no stride and kernel of 3

Strides and padding for transposed convolution (optional)

Source
- A guide to convolution arithmetic for deep learning by Vincent Dumoulin and Francesco Visin
- https://github.com/vdumoulin/conv_arithmetic

3. Examples¶

A transposed 2-D convolution layer upsamples feature maps.

This layer is sometimes incorrectly known as a "deconvolution" or "deconv" layer. This layer is the transpose of convolution and does not perform deconvolution.

%%html
<center><iframe src="https://www.youtube.com/embed/nTt_ajul8NY?start=725"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

4. CAE with MNIST¶

4.1. Import Library¶

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline

4.2. Load MNIST Data¶

# Load Data

mnist = tf.keras.datasets.mnist
(train_imgs, train_labels), (test_imgs, test_labels) = mnist.load_data()

train_imgs, test_imgs = train_imgs/255.0, test_imgs/255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step

Only use (1, 5, 6) digits to visualize latent space in 2-D

# Use Only 1,5,6 Digits to Visualize

train_x = train_imgs[np.hstack([np.where(train_labels == 1),
                                np.where(train_labels == 5),
                                np.where(train_labels == 6)])][0]
train_y = train_labels[np.hstack([np.where(train_labels == 1),
                                  np.where(train_labels == 5),
                                  np.where(train_labels == 6)])][0]
test_x = test_imgs[np.hstack([np.where(test_labels == 1),
                              np.where(test_labels == 5),
                              np.where(test_labels == 6)])][0]
test_y = test_labels[np.hstack([np.where(test_labels == 1),
                                np.where(test_labels == 5),
                                np.where(test_labels == 6)])][0]

train_x = train_x.reshape(-1,28,28,1)
test_x = test_x.reshape(-1,28,28,1)

The following architecture has been implemented.

4.3. Build a Model¶

encoder = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32,
                           kernel_size = (3,3),
                           strides = (2,2),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (28, 28, 1)),

    tf.keras.layers.Conv2D(filters = 64,
                           kernel_size = (3,3),
                           strides = (2,2),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (14, 14, 32)),

    tf.keras.layers.Conv2D(filters = 2,
                           kernel_size = (7,7),
                           padding = 'VALID',
                           input_shape = (7,7,64))
])

decoder = tf.keras.models.Sequential([
    tf.keras.layers.Conv2DTranspose(filters = 64,
                                    kernel_size = (7,7),
                                    strides = (1,1),
                                    activation = 'relu',
                                    padding = 'VALID',
                                    input_shape = (1, 1, 2)),

    tf.keras.layers.Conv2DTranspose(filters = 32,
                                    kernel_size = (3,3),
                                    strides = (2,2),
                                    activation = 'relu',
                                    padding = 'SAME',
                                    input_shape = (7, 7, 64)),

    tf.keras.layers.Conv2DTranspose(filters = 1,
                                    kernel_size = (7,7),
                                    strides = (2,2),
                                    padding = 'SAME',
                                    input_shape = (14,14,32))
])

latent = encoder.output
result = decoder(latent)

model = tf.keras.Model(inputs = encoder.input, outputs = result)

4.4. Define Loss and Optimizer¶

model.compile(optimizer = 'adam',
              loss = 'mean_squared_error')

4.5. Define Optimization Configuration and Then Optimize¶

model.fit(train_x, train_x, epochs = 10)

Epoch 1/10
566/566 [==============================] - 15s 6ms/step - loss: 0.0435
Epoch 2/10
566/566 [==============================] - 3s 4ms/step - loss: 0.0340
Epoch 3/10
566/566 [==============================] - 3s 6ms/step - loss: 0.0320
Epoch 4/10
566/566 [==============================] - 4s 7ms/step - loss: 0.0309
Epoch 5/10
566/566 [==============================] - 4s 8ms/step - loss: 0.0302
Epoch 6/10
566/566 [==============================] - 4s 7ms/step - loss: 0.0295
Epoch 7/10
566/566 [==============================] - 5s 9ms/step - loss: 0.0290
Epoch 8/10
566/566 [==============================] - 4s 8ms/step - loss: 0.0287
Epoch 9/10
566/566 [==============================] - 4s 8ms/step - loss: 0.0283
Epoch 10/10
566/566 [==============================] - 5s 8ms/step - loss: 0.0281

<keras.src.callbacks.History at 0x785ab231c490>

test_img = test_x[[6]]
x_reconst = model.predict(test_img)

plt.figure(figsize = (6, 4))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28,28), 'gray')
plt.title('Input image', fontsize = 15)
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(x_reconst.reshape(28,28), 'gray')
plt.title('Reconstructed image', fontsize = 15)
plt.axis('off')
plt.show()

1/1 [==============================] - 0s 123ms/step

idx = np.random.choice(test_y.shape[0], 500)
rnd_x, rnd_y = test_x[idx], test_y[idx]

rnd_latent = encoder.predict(rnd_x)
rnd_latent = rnd_latent.reshape(-1,2)

plt.figure(figsize = (6, 6))
plt.scatter(rnd_latent[rnd_y == 1, 0], rnd_latent[rnd_y == 1, 1], label = '1')
plt.scatter(rnd_latent[rnd_y == 5, 0], rnd_latent[rnd_y == 5, 1], label = '5')
plt.scatter(rnd_latent[rnd_y == 6, 0], rnd_latent[rnd_y == 6, 1], label = '6')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(fontsize = 15)
plt.axis('equal')
plt.show()

16/16 [==============================] - 0s 8ms/step

new_latent = np.array([[-8, 0]]).reshape(-1,1,1,2)

fake_img = decoder.predict(new_latent)

plt.figure(figsize = (9, 4))
plt.subplot(1,2,1)
plt.scatter(rnd_latent[rnd_y == 1, 0], rnd_latent[rnd_y == 1, 1], label = '1')
plt.scatter(rnd_latent[rnd_y == 5, 0], rnd_latent[rnd_y == 5, 1], label = '5')
plt.scatter(rnd_latent[rnd_y == 6, 0], rnd_latent[rnd_y == 6, 1], label = '6')
plt.scatter(new_latent[:,:,:,0], new_latent[:,:,:,1], c = 'k', marker = 'o', s = 200, label = 'new data')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(loc = 2, fontsize = 12)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(fake_img.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()

1/1 [==============================] - 0s 29ms/step

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')