Autoencoder


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Unsupervised Learning


Definition

  • Unsupervised learning refers to most attempts to extract information from a distribution that do not require human labor to annotate example
  • Main task is to find the 'best' representation of the data

Dimension Reduction

  • Attempt to compress as much information as possible in a smaller representation
  • Preserve as much information as possible while obeying some constraint aimed at keeping the representation simpler

2. Autoencoders

It is like 'deep learning version' of unsupervised learning.


Definition

  • An autoencoder is a neural network that is trained to attempt to copy its input to its output
  • The network consists of two parts: an encoder and a decoder that produce a reconstruction


Encoder and Decoder

  • Encoder function : $z = f(x)$
  • Decoder function : $x = g(z)$
  • We learn to set $g\left(f(x)\right) = x$





  • Autoencoder combines an encoder $f$ from the original space $\mathscr{X}$ to a latent space $\mathscr{F}$, and a decoder $g$ to map back to $\mathscr{X}$, such that $f \circ g$ is [close to] the identity on the data


$$ \mathbb{E} \left[ \lVert X - g \circ f(X) \rVert^2 \right] \approx 0$$



  • A proper autoencoder has to capture a "good" parametrization of the signal, and in particular the statistical dependencies between the signal components.

3. Autoencoder with TensorFlow

  • MNIST example
  • Use only (1, 5, 6) digits to visualize in 2-D



3.1. Import Library

In [ ]:
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
In [1]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline

3.2. Load MNIST Data

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
WARNING: Logging before flag parsing goes to stderr.
W0816 22:39:10.569116  4640 deprecation.py:323] From <ipython-input-2-8bf8ae5a5303>:2: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
W0816 22:39:10.572588  4640 deprecation.py:323] From c:\users\seungchul\appdata\local\programs\python\python35\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
W0816 22:39:10.573580  4640 deprecation.py:323] From c:\users\seungchul\appdata\local\programs\python\python35\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
W0816 22:39:11.033372  4640 deprecation.py:323] From c:\users\seungchul\appdata\local\programs\python\python35\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
W0816 22:39:11.036349  4640 deprecation.py:323] From c:\users\seungchul\appdata\local\programs\python\python35\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.one_hot on tensors.
W0816 22:39:11.105293  4640 deprecation.py:323] From c:\users\seungchul\appdata\local\programs\python\python35\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
  • Only use (1, 5, 6) digits to visualize latent space in 2-D
In [3]:
train_idx = ((np.argmax(mnist.train.labels, 1) == 1) | \
             (np.argmax(mnist.train.labels, 1) == 5) | \
             (np.argmax(mnist.train.labels, 1) == 6))
test_idx = ((np.argmax(mnist.test.labels, 1) == 1) | \
            (np.argmax(mnist.test.labels, 1) == 5) | \
            (np.argmax(mnist.test.labels, 1) == 6))

train_imgs   = mnist.train.images[train_idx]
train_labels = mnist.train.labels[train_idx]
test_imgs    = mnist.test.images[test_idx]
test_labels  = mnist.test.labels[test_idx]
n_train      = train_imgs.shape[0]
n_test       = test_imgs.shape[0]

print ("The number of training images : {}, shape : {}".format(n_train, train_imgs.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_imgs.shape))
The number of training images : 16583, shape : (16583, 784)
The number of testing images : 2985, shape : (2985, 784)

3.3. Define a Structure of an Autoencoder

  • Input shape and latent variable shape
  • Encoder shape
  • Decoder shape


In [4]:
# Shape of input and latent variable

n_input = 28*28

# Encoder structure
n_encoder1 = 500
n_encoder2 = 300

n_latent = 2

# Decoder structure
n_decoder2 = 300
n_decoder1 = 500

3.4. Define Weights, Biases, and Placeholder

  • Define weights and biases for encoder and decoder, separately
  • Based on the pre-defined layer size
  • Initialize with normal distribution of $\mu=0$ and $\sigma=0.1$
In [5]:
weights = {
    'encoder1' : tf.Variable(tf.random_normal([n_input, n_encoder1], stddev = 0.1)),
    'encoder2' : tf.Variable(tf.random_normal([n_encoder1, n_encoder2], stddev = 0.1)),
    'latent' : tf.Variable(tf.random_normal([n_encoder2, n_latent], stddev = 0.1)),
    'decoder2' : tf.Variable(tf.random_normal([n_latent, n_decoder2], stddev = 0.1)),
    'decoder1' : tf.Variable(tf.random_normal([n_decoder2, n_decoder1], stddev = 0.1)),
    'reconst' : tf.Variable(tf.random_normal([n_decoder1, n_input], stddev = 0.1))
}

biases = {
    'encoder1' : tf.Variable(tf.random_normal([n_encoder1], stddev = 0.1)),
    'encoder2' : tf.Variable(tf.random_normal([n_encoder2], stddev = 0.1)),
    'latent' : tf.Variable(tf.random_normal([n_latent], stddev = 0.1)),
    'decoder2' : tf.Variable(tf.random_normal([n_decoder2], stddev = 0.1)),
    'decoder1' : tf.Variable(tf.random_normal([n_decoder1], stddev = 0.1)),
    'reconst' : tf.Variable(tf.random_normal([n_input], stddev = 0.1))
}
In [6]:
x = tf.placeholder(tf.float32, [None, n_input])

3.5. Build a Model

Encoder

  • Simple ANN (MLP) model
  • Use tanh for a nonlinear activation function
  • latent is not applied with a nonlinear activation function

Decoder

  • Simple ANN (MLP) model
  • Use tanh for a nonlinear activation function
  • reconst is not applied with a nonlinear activation function


In [7]:
def encoder(x, weights, biases):
    encoder1 = tf.add(tf.matmul(x, weights['encoder1']), biases['encoder1'])
    encoder1 = tf.nn.tanh(encoder1)
    
    encoder2 = tf.add(tf.matmul(encoder1, weights['encoder2']), biases['encoder2'])
    encoder2 = tf.nn.tanh(encoder2)
    
    latent = tf.add(tf.matmul(encoder2, weights['latent']), biases['latent'])

    return latent
In [8]:
def decoder(latent, weights, biases):
    decoder2 = tf.add(tf.matmul(latent, weights['decoder2']), biases['decoder2'])
    decoder2 = tf.nn.tanh(decoder2)
    
    decoder1 = tf.add(tf.matmul(decoder2, weights['decoder1']), biases['decoder1'])
    decoder1 = tf.nn.tanh(decoder1)
    
    reconst = tf.add(tf.matmul(decoder1, weights['reconst']), biases['reconst'])
   
    return reconst

3.6. Define Loss and Optimizer

Loss

  • Squared loss
$$ \frac{1}{m}\sum_{i=1}^{m} (t_{i} - y_{i})^2 $$

Optimizer

  • AdamOptimizer: the most popular optimizer
In [9]:
LR = 0.0001

latent = encoder(x, weights, biases)
reconst = decoder(latent, weights, biases)
loss = tf.square(tf.subtract(x, reconst))
loss = tf.reduce_mean(loss)

optm = tf.train.AdamOptimizer(LR).minimize(loss)

3.7. Define Optimization Configuration and Then Optimize



  • Define parameters for training autoencoder
    • n_batch : batch size for mini-batch gradient descent
    • n_iter : the number of iteration steps
    • n_prt : check loss for every n_prt iteration
In [10]:
n_batch = 50
n_iter = 2500
n_prt = 250
In [11]:
def train_batch_maker(batch_size):
    random_idx = np.random.randint(n_train, size = batch_size)
    return train_imgs[random_idx], train_labels[random_idx]
In [12]:
def test_batch_maker(batch_size):
    random_idx = np.random.randint(n_test, size = batch_size)
    return test_imgs[random_idx], test_labels[random_idx]
In [13]:
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

loss_record_train = []
loss_record_test = []
for epoch in range(n_iter):
    train_x, _ = train_batch_maker(n_batch)
    sess.run(optm, feed_dict = {x : train_x})  
    
    if epoch % n_prt == 0:
        test_x, _ = test_batch_maker(n_batch)
        c1 = sess.run(loss, feed_dict = {x: train_x})
        c2 = sess.run(loss, feed_dict = {x: test_x})
        loss_record_train.append(c1)
        loss_record_test.append(c2)
        print ("Iter : {}".format(epoch))
        print ("Cost : {}".format(c1))
        
plt.figure(figsize=(10,8))
plt.plot(np.arange(len(loss_record_train))*n_prt, loss_record_train, label = 'training')
plt.plot(np.arange(len(loss_record_test))*n_prt, loss_record_test, label = 'testing')
plt.xlabel('iteration', fontsize = 15)
plt.ylabel('loss', fontsize = 15)
plt.legend(fontsize = 12)
plt.ylim([0,np.max(loss_record_train)])
plt.show()
Iter : 0
Cost : 0.35919633507728577
Iter : 250
Cost : 0.046853065490722656
Iter : 500
Cost : 0.042428743094205856
Iter : 750
Cost : 0.04132849723100662
Iter : 1000
Cost : 0.04540831968188286
Iter : 1250
Cost : 0.04010439291596413
Iter : 1500
Cost : 0.03963790461421013
Iter : 1750
Cost : 0.031198592856526375
Iter : 2000
Cost : 0.036051854491233826
Iter : 2250
Cost : 0.03194524720311165

3.8. Test or Evaluate

  • Test reconstruction performance of the autoencoder
In [14]:
test_x, _ = test_batch_maker(1)
x_reconst = sess.run(reconst, feed_dict = {x: test_x})

plt.figure(figsize = (10,8))
plt.subplot(1,2,1)
plt.imshow(test_x.reshape(28,28), 'gray')
plt.title('Input Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.subplot(1,2,2)
plt.imshow(x_reconst.reshape(28,28), 'gray')
plt.title('Reconstructed Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()
  • To see the distribution of latent variables, we make a projection of 784-dimensional image space onto 2-dimensional latent space
In [15]:
test_x, test_y = test_batch_maker(500)
test_y = np.argmax(test_y, axis = 1)
test_latent = sess.run(latent, feed_dict = {x: test_x})

plt.figure(figsize = (10,10))
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.title('Latent Space', fontsize=15)
plt.xlabel('Z1', fontsize=15)
plt.ylabel('Z2', fontsize=15)
plt.legend(fontsize = 15)
plt.axis('equal')
plt.show()

Data Generation

  • It generates something that makes sense.

  • These results are unsatisfying, because the density model used on the latent space ℱ is too simple and inadequate.

  • Building a “good” model amounts to our original problem of modeling an empirical distribution, although it may now be in a lower dimension space.

  • This is a motivation to VAE or GAN.

In [16]:
new_data = np.array([[-4, 0]])

latent_input = tf.placeholder(tf.float32, [None, n_latent])
reconst = decoder(latent_input, weights, biases)
fake_image = sess.run(reconst, feed_dict = {latent_input: new_data})

plt.figure(figsize=(16,7))
plt.subplot(1,2,1)
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.scatter(new_data[:,0], new_data[:,1], c = 'k', marker = 'o', s = 200, label = 'new data')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(loc = 2, fontsize = 12)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(fake_image.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()

4. Visualization

Image Generation

  • Select an arbitrary latent varibale $z$
  • Generate images using the learned decoder
In [17]:
# Initialize canvas
nx = 20
ny = 20
x_values = np.linspace(-8, 4, nx)
y_values = np.linspace(-4, 6, ny)
canvas = np.empty((28*ny, 28*nx))

# Define placeholder
latent_input = tf.placeholder(tf.float32, [None, n_latent])
reconst = decoder(latent_input, weights, biases)

for i, yi in enumerate(y_values):
        for j, xi in enumerate(x_values):
            latent_ = np.array([[xi, yi]])
            reconst_ = sess.run(reconst, feed_dict = {latent_input: latent_})
            canvas[(nx-i-1)*28:(nx-i)*28,j*28:(j+1)*28] = reconst_.reshape(28, 28)

plt.figure(figsize = (16, 7))
plt.subplot(1,2,1)
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(canvas, 'gray')
plt.title('Manifold', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()

5. Latent Representation

To get an intuition of the latent representation, we can pick two samples 𝑥 and 𝑥′ at random and interpolate samples along the line in the latent space

$$g((1-\alpha)f(x) + \alpha f(x'))$$



  • Interpolation in High Dimension



  • Interpolation in Manifold



6. Other Tutorials

In [18]:
%%html
<center><iframe src="https://www.youtube.com/embed/QujriOAtps4?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [19]:
%%html
<center><iframe src="https://www.youtube.com/embed/nTt_ajul8NY?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [20]:
%%html
<center><iframe src="https://www.youtube.com/embed/H1AllrJ-_30?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [21]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')