1. Unsupervised Learning¶

Definition

Unsupervised learning refers to most attempts to extract information from a distribution that do not require human labor to annotate example
Main task is to find the 'best' representation of the data

Dimension Reduction

Attempt to compress as much information as possible in a smaller representation
Preserve as much information as possible while obeying some constraint aimed at keeping the representation simpler

2. Autoencoders¶

It is like 'deep learning version' of unsupervised learning.

Definition

An autoencoder is a neural network that is trained to attempt to copy its input to its output
The network consists of two parts: an encoder and a decoder that produce a reconstruction

Encoder and Decoder

Encoder function : $z = f(x)$
Decoder function : $x = g(z)$
We learn to set $g\left(f(x)\right) = x$

3. Autoencoder with TensorFlow¶

MNIST example

Use only (1, 5, 6) digits to visualize in 2-D

import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline

# Load Data

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape((x_train.shape[0],28*28)) / 255
x_test = x_test.reshape((x_test.shape[0],28*28)) / 255

# Use Only 1,5,6 Digits to Visualize

train_idx1 = np.array(np.where(y_train == 1))
train_idx5 = np.array(np.where(y_train == 5))
train_idx6 = np.array(np.where(y_train == 6))
train_idx = np.sort(np.concatenate((train_idx1, train_idx5, train_idx6), axis= None))

test_idx1 = np.array(np.where(y_test == 1))
test_idx5 = np.array(np.where(y_test == 5))
test_idx6 = np.array(np.where(y_test == 6))
test_idx = np.sort(np.concatenate((test_idx1, test_idx5, test_idx6), axis= None))

train_imgs = x_train[train_idx]
train_labels = y_train[train_idx]
test_imgs = x_test[test_idx]
test_labels = y_test[test_idx]
n_train = train_imgs.shape[0]
n_test = test_imgs.shape[0]

print ("The number of training images : {}, shape : {}".format(n_train, train_imgs.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_imgs.shape))

The number of training images : 18081, shape : (18081, 784)
The number of testing images : 2985, shape : (2985, 784)

3.1. Define a Structure of an Autoencoder¶

Input shape and latent variable shape
Encoder shape
Decoder shape

# Define Structure

# Encoder Structure
encoder = tf.keras.models.Sequential([
    tf.keras.layers.Dense(500, activation = 'relu', input_shape = (28*28,)),
    tf.keras.layers.Dense(300, activation = 'relu'),
    tf.keras.layers.Dense(2, activation = None)
    ])

# Decoder Structure
decoder = tf.keras.models.Sequential([
    tf.keras.layers.Dense(300, activation = 'relu', input_shape = (2,)),
    tf.keras.layers.Dense(500, activation = 'relu'),
    tf.keras.layers.Dense(28*28, activation = None)
    ])

# Autoencoder = Encoder + Decoder
autoencoder = tf.keras.models.Sequential([encoder, decoder])

3.2. Define Loss and Optimizer¶

Loss

Squared loss

$$ \frac{1}{m}\sum_{i=1}^{m} (t_{i} - y_{i})^2 $$

Optimizer

AdamOptimizer: the most popular optimizer

autoencoder.compile(optimizer = tf.keras.optimizers.Adam(0.001),
                    loss = 'mean_squared_error',
                    metrics = ['mse'])

# Train Model & Evaluate Test Data

n_batch = 50
n_epoch = 10

training = autoencoder.fit(train_imgs, train_imgs, batch_size = n_batch, epochs = n_epoch)

Epoch 1/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0374 - mse: 0.0374
Epoch 2/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0304 - mse: 0.0304
Epoch 3/10
362/362 [==============================] - 3s 9ms/step - loss: 0.0289 - mse: 0.0289
Epoch 4/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0279 - mse: 0.0279
Epoch 5/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0273 - mse: 0.0273
Epoch 6/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0269 - mse: 0.0269
Epoch 7/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0265 - mse: 0.0265
Epoch 8/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0264 - mse: 0.0264
Epoch 9/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0261 - mse: 0.0261
Epoch 10/10
362/362 [==============================] - 3s 8ms/step - loss: 0.0259 - mse: 0.0259

3.3. Test or Evaluate¶

Test reconstruction performance of the autoencoder

test_scores = autoencoder.evaluate(test_imgs, test_imgs, verbose = 2)
print('Test loss: {}'.format(test_scores[0]))
print('Mean Squared Error: {} %'.format(test_scores[1]*100))

94/94 - 0s - loss: 0.0265 - mse: 0.0265
Test loss: 0.026499442756175995
Mean Squared Error: 2.6499442756175995 %

# Visualize Evaluation on Test Data

# rand_idx = np.random.randint(1, test_imgs.shape[0])
rand_idx = 6

test_img = test_imgs[rand_idx]
reconst_img = autoencoder.predict(test_img.reshape(1,28*28))

plt.figure(figsize = (10, 8))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28,28), 'gray')
plt.title('Input Image', fontsize = 12)

plt.xticks([])
plt.yticks([])
plt.subplot(1,2,2)
plt.imshow(reconst_img.reshape(28,28), 'gray')
plt.title('Reconstructed Image', fontsize = 12)
plt.xticks([])
plt.yticks([])

plt.show()

4. Latent Space and Generation¶

To see the distribution of latent variables, we make a projection of 784-dimensional image space onto 2-dimensional latent space

idx = np.random.randint(0, len(test_labels), 500)
test_x, test_y = test_imgs[idx], test_labels[idx]

test_latent = encoder.predict(test_x)

plt.figure(figsize = (10,10))
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.title('Latent Space', fontsize=15)
plt.xlabel('Z1', fontsize=15)
plt.ylabel('Z2', fontsize=15)
plt.legend(fontsize = 15)
plt.axis('equal')
plt.show()

Data Generation

It generates something that makes sense.
These results are unsatisfying, because the density model used on the latent space ℱ is too simple and inadequate.
Building a “good” model amounts to our original problem of modeling an empirical distribution, although it may now be in a lower dimension space.
This is a motivation to VAE or GAN.

new_data = np.array([[2, 4]])

fake_image = decoder.predict(new_data)

plt.figure(figsize=(16,7))
plt.subplot(1,2,1)
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.scatter(new_data[:,0], new_data[:,1], c = 'k', marker = 'o', s = 200, label = 'new data')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(loc = 2, fontsize = 12)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(fake_image.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')