Autoencoder


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Unsupervised Learning


Definition

  • Unsupervised learning refers to most attempts to extract information from a distribution that do not require human labor to annotate example
  • Main task is to find the 'best' representation of the data

Dimension Reduction

  • Attempt to compress as much information as possible in a smaller representation
  • Preserve as much information as possible while obeying some constraint aimed at keeping the representation simpler

2. Autoencoders

It is like 'deep learning version' of unsupervised learning.

Definition

  • An autoencoder is a neural network that is trained to attempt to copy its input to its output
  • The network consists of two parts: an encoder and a decoder that produce a reconstruction

Encoder and Decoder

  • Encoder function : $z = f(x)$
  • Decoder function : $x = g(z)$
  • We learn to set $g\left(f(x)\right) = x$





3. Autoencoder with TensorFlow

  • MNIST example
  • Use only (1, 5, 6) digits to visualize in 2-D


In [1]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline
In [2]:
# Load Data

mnist = tf.keras.datasets.mnist
(train_imgs, train_labels), (test_imgs, test_labels) = mnist.load_data()

train_imgs, test_imgs = train_imgs/255.0, test_imgs/255.0
In [3]:
# Use Only 1,5,6 Digits to Visualize

train_x = train_imgs[np.hstack([np.where(train_labels == 1), 
                                np.where(train_labels == 5), 
                                np.where(train_labels == 6)])][0]
train_y = train_labels[np.hstack([np.where(train_labels == 1),
                                  np.where(train_labels == 5),
                                  np.where(train_labels == 6)])][0]
test_x = test_imgs[np.hstack([np.where(test_labels == 1), 
                              np.where(test_labels == 5), 
                              np.where(test_labels == 6)])][0]
test_y = test_labels[np.hstack([np.where(test_labels == 1), 
                                np.where(test_labels == 5), 
                                np.where(test_labels == 6)])][0]
In [4]:
# Flattening

train_x = train_x.reshape(-1, 28*28)
test_x = test_x.reshape(-1, 28*28)

3.1. Define a Structure of an Autoencoder

  • Input shape and latent variable shape
  • Encoder shape
  • Decoder shape


In [5]:
# Define Structure

# Encoder Structure
encoder = tf.keras.models.Sequential([    
    tf.keras.layers.Dense(input_shape = (28*28,), units = 500, activation = 'relu'),
    tf.keras.layers.Dense(units = 300, activation = 'relu'),
    tf.keras.layers.Dense(units = 2, activation = None)
    ])

# Decoder Structure
decoder = tf.keras.models.Sequential([
    tf.keras.layers.Dense(input_shape = (2,), units = 300, activation = 'relu'),
    tf.keras.layers.Dense(units = 500, activation = 'relu'),
    tf.keras.layers.Dense(units = 28*28, activation = None)
    ])

# Autoencoder = Encoder + Decoder
autoencoder = tf.keras.models.Sequential([encoder, decoder])

3.2. Define Loss and Optimizer

Loss

  • Squared loss
$$ \frac{1}{m}\sum_{i=1}^{m} (t_{i} - y_{i})^2 $$

Optimizer

  • AdamOptimizer: the most popular optimizer
In [6]:
autoencoder.compile(optimizer = tf.keras.optimizers.Adam(0.001),
                    loss = 'mean_squared_error')
In [7]:
training = autoencoder.fit(train_x, train_x, batch_size = 50, epochs = 10)
Epoch 1/10
362/362 [==============================] - 3s 7ms/step - loss: 0.0378
Epoch 2/10
362/362 [==============================] - 4s 10ms/step - loss: 0.0302
Epoch 3/10
362/362 [==============================] - 3s 9ms/step - loss: 0.0288
Epoch 4/10
362/362 [==============================] - 3s 10ms/step - loss: 0.0279
Epoch 5/10
362/362 [==============================] - 4s 10ms/step - loss: 0.0274
Epoch 6/10
362/362 [==============================] - 4s 11ms/step - loss: 0.0270
Epoch 7/10
362/362 [==============================] - 4s 11ms/step - loss: 0.0266
Epoch 8/10
362/362 [==============================] - 4s 11ms/step - loss: 0.0264
Epoch 9/10
362/362 [==============================] - 4s 11ms/step - loss: 0.0261
Epoch 10/10
362/362 [==============================] - 5s 14ms/step - loss: 0.0259

3.3. Test or Evaluate

  • Test reconstruction performance of the autoencoder
In [8]:
test_scores = autoencoder.evaluate(test_x, test_x)
94/94 [==============================] - 2s 6ms/step - loss: 0.0262
In [9]:
# Visualize Evaluation on Test Data

test_img = test_x[[6]]
reconst_img = autoencoder.predict(test_img)

plt.figure(figsize = (10, 8))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28,28), 'gray')
plt.title('Input Image', fontsize = 12)

plt.xticks([])
plt.yticks([])
plt.subplot(1,2,2)
plt.imshow(reconst_img.reshape(28,28), 'gray')
plt.title('Reconstructed Image', fontsize = 12)
plt.xticks([])
plt.yticks([])

plt.show()

4. Latent Space and Generation

  • To see the distribution of latent variables, we make a projection of 784-dimensional image space onto 2-dimensional latent space
In [10]:
idx = np.random.choice(test_y.shape[0], 500)
rnd_x, rnd_y = test_x[idx], test_y[idx]
In [11]:
rnd_latent = encoder.predict(rnd_x)

plt.figure(figsize = (10,10))
plt.scatter(rnd_latent[rnd_y == 1, 0], rnd_latent[rnd_y == 1, 1], label = '1')
plt.scatter(rnd_latent[rnd_y == 5, 0], rnd_latent[rnd_y == 5, 1], label = '5')
plt.scatter(rnd_latent[rnd_y == 6, 0], rnd_latent[rnd_y == 6, 1], label = '6')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(fontsize = 15)
plt.axis('equal')
plt.show()

Data Generation

  • It generates something that makes sense.

  • These results are unsatisfying, because the density model used on the latent space ℱ is too simple and inadequate.

  • Building a “good” model amounts to our original problem of modeling an empirical distribution, although it may now be in a lower dimension space.

  • This is a motivation to VAE or GAN.

In [15]:
new_latent = np.array([[2, -2]])

fake_img = decoder.predict(new_latent)

plt.figure(figsize = (16,7))
plt.subplot(1,2,1)
plt.scatter(rnd_latent[rnd_y == 1, 0], rnd_latent[rnd_y == 1, 1], label = '1')
plt.scatter(rnd_latent[rnd_y == 5, 0], rnd_latent[rnd_y == 5, 1], label = '5')
plt.scatter(rnd_latent[rnd_y == 6, 0], rnd_latent[rnd_y == 6, 1], label = '6')
plt.scatter(new_latent[:,0], new_latent[:,1], c = 'k', marker = 'o', s = 200, label = 'new data')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(loc = 2, fontsize = 12)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(fake_img.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()

5. Video Lectures

In [13]:
%%html
<center><iframe src="https://www.youtube.com/embed/KU6SLiDjoX8?rel=0" 
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>
In [14]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')