Autoencoder


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Unsupervised Learning


Definition

  • Unsupervised learning refers to most attempts to extract information from a distribution that do not require human labor to annotate example
  • Main task is to find the 'best' representation of the data

Dimension Reduction

  • Attempt to compress as much information as possible in a smaller representation
  • Preserve as much information as possible while obeying some constraint aimed at keeping the representation simpler

2. Autoencoders

It is like 'deep learning version' of unsupervised learning.


Definition

  • An autoencoder is a neural network that is trained to attempt to copy its input to its output
  • The network consists of two parts: an encoder and a decoder that produce a reconstruction


Encoder and Decoder

  • Encoder function : $z = f(x)$
  • Decoder function : $x = g(z)$
  • We learn to set $g\left(f(x)\right) = x$





  • Autoencoder combines an encoder $f$ from the original space $\mathscr{X}$ to a latent space $\mathscr{F}$, and a decoder $g$ to map back to $\mathscr{X}$, such that $f \circ g$ is [close to] the identity on the data


$$ \mathbb{E} \left[ \lVert X - g \circ f(X) \rVert^2 \right] \approx 0$$



  • A proper autoencoder has to capture a "good" parametrization of the signal, and in particular the statistical dependencies between the signal components.

3. Autoencoder with Scikit Learn

  • MNIST example
  • Use only (1, 5, 6) digits to visualize in 2-D



3.1. Import Library

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.neural_network import MLPRegressor
from sklearn.metrics import accuracy_score

3.2. Load MNIST Data

In [2]:
train_x = np.load('./data_files/mnist_train_images.npy')
train_y = np.load('./data_files/mnist_train_labels.npy')
test_x = np.load('./data_files/mnist_test_images.npy')
test_y = np.load('./data_files/mnist_test_labels.npy')

n_train = train_x.shape[0]
n_test = test_x.shape[0]

print ("The number of training images : {}, shape : {}".format(n_train, train_x.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_x.shape))
The number of training images : 16583, shape : (16583, 784)
The number of testing images : 2985, shape : (2985, 784)
In [17]:
idx = np.random.randint(train_x.shape[0])
img = train_x[idx].reshape(28,28)

plt.figure(figsize = (6,6))
plt.imshow(img,'gray')
plt.title("Label : {}".format(np.argmax(train_y[idx,:])))
plt.xticks([])
plt.yticks([])
plt.show()

3.3. Define a Structure of an Autoencoder

  • Input shape and latent variable shape
  • Encoder shape
  • Decoder shape


In [4]:
# Shape of input and latent variable

n_input = 28*28

# Encoder structure
n_encoder1 = 500
n_encoder2 = 300

n_latent = 2

# Decoder structure
n_decoder2 = 300
n_decoder1 = 500

3.4. Build a Model

Encoder

  • Simple ANN (MLP) model
  • Use tanh for a nonlinear activation function
  • latent is not applied with a nonlinear activation function

Decoder

  • Simple ANN (MLP) model
  • Use tanh for a nonlinear activation function
  • reconst is not applied with a nonlinear activation function


Loss

  • Squared loss
$$ \frac{1}{m}\sum_{i=1}^{m} (t_{i} - y_{i})^2 $$

Optimizer

  • AdamOptimizer: the most popular optimizer
In [5]:
reg = MLPRegressor(hidden_layer_sizes = (n_encoder1, n_encoder2, n_latent, n_decoder2, n_decoder1), 
                   activation = 'tanh', 
                   solver = 'adam', 
                   learning_rate_init = 0.0001, 
                   max_iter = 20, 
                   tol = 0.0000001, 
                   verbose = True)

3.5. Optimize



In [6]:
reg.fit(train_x, train_x)
Iteration 1, loss = 0.03294983
Iteration 2, loss = 0.02372178
Iteration 3, loss = 0.02275203
Iteration 4, loss = 0.02251063
Iteration 5, loss = 0.02237981
Iteration 6, loss = 0.02224030
Iteration 7, loss = 0.02213638
Iteration 8, loss = 0.02208943
Iteration 9, loss = 0.02206459
Iteration 10, loss = 0.02205256
Iteration 11, loss = 0.02204538
Iteration 12, loss = 0.02204093
Iteration 13, loss = 0.02203999
Iteration 14, loss = 0.02203000
Iteration 15, loss = 0.02202388
Iteration 16, loss = 0.02201583
Iteration 17, loss = 0.02200375
Iteration 18, loss = 0.02198116
Iteration 19, loss = 0.02193418
Iteration 20, loss = 0.02183408
c:\users\user\appdata\local\programs\python\python35\lib\site-packages\sklearn\neural_network\multilayer_perceptron.py:562: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (20) reached and the optimization hasn't converged yet.
  % self.max_iter, ConvergenceWarning)
Out[6]:
MLPRegressor(activation='tanh', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(500, 300, 2, 300, 500),
       learning_rate='constant', learning_rate_init=0.0001, max_iter=20,
       momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
       power_t=0.5, random_state=None, shuffle=True, solver='adam',
       tol=1e-07, validation_fraction=0.1, verbose=True, warm_start=False)

3.6. Test or Evaluate

  • Test reconstruction performance of the autoencoder
In [20]:
idx = np.random.randint(test_x.shape[0])
x_reconst = reg.predict(test_x[idx].reshape(-1,784))

plt.figure(figsize = (10,8))
plt.subplot(1,2,1)
plt.imshow(test_x[idx].reshape(28,28), 'gray')
plt.title('Imput Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.subplot(1,2,2)
plt.imshow(x_reconst.reshape(28,28), 'gray')
plt.title('Reconstructed Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()
  • To see the distribution of latent variables, we make a projection of 784-dimensional image space onto 2-dimensional latent space
In [8]:
def encoder(data):
    data = np.asmatrix(data)
    
    encoder1 = data*reg.coefs_[0] + reg.intercepts_[0]
    encoder1 = (np.exp(encoder1) - np.exp(-encoder1))/(np.exp(encoder1) + np.exp(-encoder1))
    
    encoder2 = encoder1*reg.coefs_[1] + reg.intercepts_[1]
    encoder2 = (np.exp(encoder2) - np.exp(-encoder2))/(np.exp(encoder2) + np.exp(-encoder2))
    
    latent = encoder2*reg.coefs_[2] + reg.intercepts_[2]
    latent = (np.exp(latent) - np.exp(-latent))/(np.exp(latent) + np.exp(-latent))
    
    return np.asarray(latent)
In [9]:
test_latent = encoder(test_x)

plt.figure(figsize = (10,10))
plt.scatter(test_latent[np.argmax(test_y, axis = 1) == 1,0], test_latent[np.argmax(test_y, axis = 1) == 1,1], label = '1')
plt.scatter(test_latent[np.argmax(test_y, axis = 1) == 5,0], test_latent[np.argmax(test_y, axis = 1) == 5,1], label = '5')
plt.scatter(test_latent[np.argmax(test_y, axis = 1) == 6,0], test_latent[np.argmax(test_y, axis = 1) == 6,1], label = '6')
plt.title('Latent Space', fontsize=15)
plt.xlabel('Z1', fontsize=15)
plt.ylabel('Z2', fontsize=15)
plt.legend(fontsize = 15)
plt.axis('equal')
plt.show()

Data Generation

In [10]:
def decoder(new_data):
    new_data = np.asmatrix(new_data)
    decoder2 = new_data*reg.coefs_[3] + reg.intercepts_[3]
    decoder2 = (np.exp(decoder2) - np.exp(-decoder2))/(np.exp(decoder2) + np.exp(-decoder2))

    decoder1 = decoder2*reg.coefs_[4] + reg.intercepts_[4]
    decoder1 = (np.exp(decoder1) - np.exp(-decoder1))/(np.exp(decoder1) + np.exp(-decoder1))

    reconst = decoder1*reg.coefs_[5] + reg.intercepts_[5]
    reconst = (np.exp(reconst) - np.exp(-reconst))/(np.exp(reconst) + np.exp(-reconst))

    return np.asarray(reconst)
In [11]:
latent = np.array([[0.5, 0.5]])
reconst = decoder(latent)

plt.figure(figsize=(16,7))
plt.subplot(1,2,1)
plt.scatter(test_latent[np.argmax(test_y, axis = 1) == 1,0], test_latent[np.argmax(test_y, axis = 1) == 1,1], label = '1')
plt.scatter(test_latent[np.argmax(test_y, axis = 1) == 5,0], test_latent[np.argmax(test_y, axis = 1) == 5,1], label = '5')
plt.scatter(test_latent[np.argmax(test_y, axis = 1) == 6,0], test_latent[np.argmax(test_y, axis = 1) == 6,1], label = '6')
plt.scatter(latent[:,0], latent[:,1], c = 'k', marker = 'o', s = 200, label = 'new data')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(loc = 2, fontsize = 12)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(reconst.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()

4. Visualization

Image Generation

  • Select an arbitrary latent varibale $z$
  • Generate images using the learned decoder
In [12]:
# Initialize canvas
nx = 20
ny = 20
x_values = np.linspace(-1, 1, nx)
y_values = np.linspace(-1, 1, ny)
canvas = np.empty((28*ny, 28*nx))

for i, yi in enumerate(y_values):
        for j, xi in enumerate(x_values):
            latent = np.array([[xi, yi]])
            reconst = decoder(latent)
            canvas[(nx-i-1)*28:(nx-i)*28,j*28:(j+1)*28] = reconst.reshape(28, 28)

plt.figure(figsize = (16, 7))
plt.subplot(1,2,1)
plt.scatter(test_latent[np.argmax(test_y, axis = 1) == 1,0], test_latent[np.argmax(test_y, axis = 1) == 1,1], label = '1')
plt.scatter(test_latent[np.argmax(test_y, axis = 1) == 5,0], test_latent[np.argmax(test_y, axis = 1) == 5,1], label = '5')
plt.scatter(test_latent[np.argmax(test_y, axis = 1) == 6,0], test_latent[np.argmax(test_y, axis = 1) == 6,1], label = '6')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(canvas, 'gray')
plt.title('Manifold', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()

5. Latent Representation

To get an intuition of the latent representation, we can pick two samples 𝑥 and 𝑥′ at random and interpolate samples along the line in the latent space

$$g((1-\alpha)f(x) + \alpha f(x'))$$



  • Interpolation in High Dimension



  • Interpolation in Manifold



6. Other Tutorials

In [13]:
%%html
<center><iframe src="https://www.youtube.com/embed/QujriOAtps4?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [14]:
%%html
<center><iframe src="https://www.youtube.com/embed/nTt_ajul8NY?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [15]:
%%html
<center><iframe src="https://www.youtube.com/embed/H1AllrJ-_30?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [16]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')