Autoencoder


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

0. Video Lectures

In [1]:
%%html 
<center><iframe src="https://www.youtube.com/embed/KU6SLiDjoX8?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

1. Unsupervised Learning


Definition

  • Unsupervised learning refers to most attempts to extract information from a distribution that do not require human labor to annotate example
  • Main task is to find the 'best' representation of the data

Dimension Reduction

  • Attempt to compress as much information as possible in a smaller representation
  • Preserve as much information as possible while obeying some constraint aimed at keeping the representation simpler

2. Autoencoders

It is like 'deep learning version' of unsupervised learning.

Definition

  • An autoencoder is a neural network that is trained to attempt to copy its input to its output
  • The network consists of two parts: an encoder and a decoder that produce a reconstruction

Encoder and Decoder

  • Encoder function : $z = f(x)$
  • Decoder function : $x = g(z)$
  • We learn to set $g\left(f(x)\right) = x$





3. Autoencoder with TensorFlow

  • MNIST example
  • Use only (1, 5, 6) digits to visualize in 2-D


In [1]:
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
In [2]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline
In [3]:
# Load Data

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape((x_train.shape[0],28*28)) / 255
x_test = x_test.reshape((x_test.shape[0],28*28)) / 255
In [4]:
# Use Only 1,5,6 Digits to Visualize

train_idx1 = np.array(np.where(y_train == 1))
train_idx5 = np.array(np.where(y_train == 5))
train_idx6 = np.array(np.where(y_train == 6))
train_idx = np.sort(np.concatenate((train_idx1, train_idx5, train_idx6), axis= None))

test_idx1 = np.array(np.where(y_test == 1))
test_idx5 = np.array(np.where(y_test == 5))
test_idx6 = np.array(np.where(y_test == 6))
test_idx = np.sort(np.concatenate((test_idx1, test_idx5, test_idx6), axis= None))

train_imgs = x_train[train_idx]
train_labels = y_train[train_idx]
test_imgs = x_test[test_idx]
test_labels = y_test[test_idx]
n_train = train_imgs.shape[0]
n_test = test_imgs.shape[0]

print ("The number of training images : {}, shape : {}".format(n_train, train_imgs.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_imgs.shape))
The number of training images : 18081, shape : (18081, 784)
The number of testing images : 2985, shape : (2985, 784)

3.1. Define a Structure of an Autoencoder

  • Input shape and latent variable shape
  • Encoder shape
  • Decoder shape


In [5]:
# Define Structure

# Encoder Structure
encoder = tf.keras.models.Sequential([
    tf.keras.layers.Dense(500, activation = 'relu', input_shape = (28*28,)),
    tf.keras.layers.Dense(300, activation = 'relu'),
    tf.keras.layers.Dense(2, activation = None)
    ])

# Decoder Structure
decoder = tf.keras.models.Sequential([
    tf.keras.layers.Dense(300, activation = 'relu', input_shape = (2,)),
    tf.keras.layers.Dense(500, activation = 'relu'),
    tf.keras.layers.Dense(28*28, activation = None)
    ])

# Autoencoder = Encoder + Decoder
autoencoder = tf.keras.models.Sequential([encoder, decoder])

3.2. Define Loss and Optimizer

Loss

  • Squared loss
$$ \frac{1}{m}\sum_{i=1}^{m} (t_{i} - y_{i})^2 $$

Optimizer

  • AdamOptimizer: the most popular optimizer
In [6]:
autoencoder.compile(optimizer = tf.keras.optimizers.Adam(0.001),
                    loss = 'mean_squared_error',
                    metrics = ['mse'])
In [7]:
# Train Model & Evaluate Test Data

n_batch = 50
n_epoch = 10

training = autoencoder.fit(train_imgs, train_imgs, batch_size = n_batch, epochs = n_epoch)
Epoch 1/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0374 - mse: 0.0374
Epoch 2/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0304 - mse: 0.0304
Epoch 3/10
362/362 [==============================] - 3s 9ms/step - loss: 0.0289 - mse: 0.0289
Epoch 4/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0279 - mse: 0.0279
Epoch 5/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0273 - mse: 0.0273
Epoch 6/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0269 - mse: 0.0269
Epoch 7/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0265 - mse: 0.0265
Epoch 8/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0264 - mse: 0.0264
Epoch 9/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0261 - mse: 0.0261
Epoch 10/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0259 - mse: 0.0259

3.3. Test or Evaluate

  • Test reconstruction performance of the autoencoder
In [8]:
test_scores = autoencoder.evaluate(test_imgs, test_imgs, verbose = 2)
print('Test loss: {}'.format(test_scores[0]))
print('Mean Squared Error: {} %'.format(test_scores[1]*100))
94/94 - 0s - loss: 0.0265 - mse: 0.0265
Test loss: 0.026499442756175995
Mean Squared Error: 2.6499442756175995 %
In [9]:
# Visualize Evaluation on Test Data

# rand_idx = np.random.randint(1, test_imgs.shape[0])
rand_idx = 6

test_img = test_imgs[rand_idx]
reconst_img = autoencoder.predict(test_img.reshape(1,28*28))

plt.figure(figsize = (10, 8))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28,28), 'gray')
plt.title('Input Image', fontsize = 12)

plt.xticks([])
plt.yticks([])
plt.subplot(1,2,2)
plt.imshow(reconst_img.reshape(28,28), 'gray')
plt.title('Reconstructed Image', fontsize = 12)
plt.xticks([])
plt.yticks([])

plt.show()

4. Latent Space and Generation

  • To see the distribution of latent variables, we make a projection of 784-dimensional image space onto 2-dimensional latent space
In [10]:
idx = np.random.randint(0, len(test_labels), 500)
test_x, test_y = test_imgs[idx], test_labels[idx]
In [11]:
test_latent = encoder.predict(test_x)

plt.figure(figsize = (10,10))
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.title('Latent Space', fontsize=15)
plt.xlabel('Z1', fontsize=15)
plt.ylabel('Z2', fontsize=15)
plt.legend(fontsize = 15)
plt.axis('equal')
plt.show()

Data Generation

  • It generates something that makes sense.

  • These results are unsatisfying, because the density model used on the latent space ℱ is too simple and inadequate.

  • Building a “good” model amounts to our original problem of modeling an empirical distribution, although it may now be in a lower dimension space.

  • This is a motivation to VAE or GAN.

In [12]:
new_data = np.array([[2, 4]])

fake_image = decoder.predict(new_data)

plt.figure(figsize=(16,7))
plt.subplot(1,2,1)
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.scatter(new_data[:,0], new_data[:,1], c = 'k', marker = 'o', s = 200, label = 'new data')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(loc = 2, fontsize = 12)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(fake_image.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()
In [13]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')