Autoencoder

By Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

Table of Contents

0. Lecture Video¶

from IPython.display import YouTubeVideo
YouTubeVideo('ul27ZUdYaBY', width = "560", height = "315")

1. Unsupervised Learning¶

Unsupervised learning is a type of machine learning where a model learns patterns, structures, or relationships in the data without labeled outputs. Unlike supervised learning, which uses input-output pairs for training, unsupervised learning works with unlabeled data, meaning the algorithm must discover inherent features or groupings within the dataset.

Definition

Unsupervised learning refers to most attempts to extract information from a distribution that do not require human labor to annotate example
Main task is to find the 'best' representation of the data

Dimension Reduction

Attempt to compress as much information as possible in a smaller representation
Preserve as much information as possible while obeying some constraint aimed at keeping the representation simpler

2. Autoencoders¶

An autoencoder is a type of neural network used for unsupervised learning that aims to learn efficient, compressed representations of input data. It consists of two main components: an encoder and a decoder. The encoder compresses the input into a lower-dimensional representation (called the latent space or bottleneck), while the decoder reconstructs the input from this compressed representation.

The objective of an autoencoder is to minimize the reconstruction error—the difference between the original input and its reconstruction.

Definition

An autoencoder is a neural network that is trained to attempt to copy its input to its output
The network consists of two parts: an encoder and a decoder that produce a reconstruction

Encoder and Decoder

Encoder function : $z = f(x)$
Decoder function : $x = g(z)$
We learn to set $g\left(f(x)\right) = x$

The top network in the below figure shows a trivial autoencoder where the input data is directly passed to the output without any significant compression or transformation. This structure may be able to perfectly reconstruct the input data, but it fails to learn any meaningful abstraction or representation. On the other hand, the bottom network illustrates a typical autoencoder with a squeezed bottleneck, which compresses the input into a lower-dimensional latent space before reconstructing it. This narrow bottleneck structure offers several key benefits:

The top structure, though capable of reconstructing the input, does not learn useful or general features. It essentially passes information without processing or summarizing it.
In contrast, the bottom structure is meaningful because the bottleneck introduces a constraint that forces the network to learn relevant features instead of copying the input.

Structure of an Autoencoder

Encoder (Compression Path):
- The encoder maps the input $x$ to a lower-dimensional latent representation $z$.
- This transformation can be represented as:
$$ z = f(x) $$
- The encoder typically consists of fully connected (dense) or convolutional layers, followed by non-linear activations (e.g., ReLU, sigmoid).
Latent Space (Bottleneck):
- The bottleneck layer $z$ represents the compressed encoding of the input.
- The size of this layer controls how much information is retained from the input.
Decoder (Reconstruction Path):
- The decoder maps the latent representation $z$ back to the original input space:
$$ \hat{x} = g(z) $$
- The decoder reconstructs the input $x$ as closely as possible to the original input.
Loss Function:
- The loss function measures the reconstruction error, typically using mean squared error (MSE):
$$ \mathcal{L}(x, \hat{x}) = \|x - \hat{x}\|^2 $$

Key Features of Autoencoders

The squeezed bottleneck in an autoencoder acts as a filter that retains essential features while discarding irrelevant or redundant information. It enhances the model's ability to learn compact, robust representations of the data and improves its generalization ability, making it effective for tasks such as dimensionality reduction, denoising, anomaly detection, and representation learning.

(1) Dimensionality Reduction

The bottleneck layer forces the network to compress the input data into a lower-dimensional representation.
This reduction removes redundant and non-essential information, retaining only the most important features.
Example: Instead of storing all pixel values of an image, the autoencoder learns a compact representation (e.g., shape, edges, or texture patterns) to reconstruct the image.

(2) Feature Extraction

By limiting the size of the bottleneck, the autoencoder learns meaningful, abstract features rather than memorizing the input.
The compressed features in the latent space can be used for classification, clustering, or visualization tasks, similar to principal components in PCA but in a non-linear manner.

(3) Preventing Overfitting

A wide bottleneck can allow the autoencoder to simply copy the input data, leading to poor generalization.
A squeezed bottleneck enforces a constraint that forces the network to generalize the underlying structure of the data rather than learning a direct mapping of the inputs.

(4) Noise Reduction (Denoising Autoencoders)

In tasks like denoising and anomaly detection, the bottleneck prevents the network from reconstructing noise, as the network focuses only on core patterns.

(5) Efficient Encoding for Data Compression

Autoencoders are often used for data compression by representing the input data with fewer bits (from high-dimensional space to low-dimensional space).
The "squeezing" helps reduce data storage needs while maintaining the ability to reconstruct the original input with minimal loss.

3. Autoencoder with TensorFlow¶

MNIST example
Use only (1, 5, 6) digits to visualize in 2-D

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline

# Load Data

mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()
train_x, test_x = train_x.reshape(-1, 784)/255.0, test_x.reshape(-1, 784)/255.0

# Use only 1,5,6 digits to visualize latent sapce

train_idx1 = np.array(np.where(train_y == 1))
train_idx5 = np.array(np.where(train_y == 5))
train_idx6 = np.array(np.where(train_y == 6))
train_idx = np.sort(np.concatenate((train_idx1, train_idx5, train_idx6), axis = None))

test_idx1 = np.array(np.where(test_y == 1))
test_idx5 = np.array(np.where(test_y == 5))
test_idx6 = np.array(np.where(test_y == 6))
test_idx = np.sort(np.concatenate((test_idx1, test_idx5, test_idx6), axis = None))

train_imgs = train_x[train_idx]
train_labels = train_y[train_idx]
test_imgs = test_x[test_idx]
test_labels = test_y[test_idx]

n_train = train_imgs.shape[0]
n_test = test_imgs.shape[0]

print ("The number of training images : {}, shape : {}".format(n_train, train_imgs.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_imgs.shape))

The number of training images : 18081, shape : (18081, 784)
The number of testing images : 2985, shape : (2985, 784)

3.1. Define a Structure of an Autoencoder¶

Input shape and latent variable shape
Encoder shape
Decoder shape

Note on 2D Latent Space and Digits 1, 5, and 6 from MNIST

In this autoencoder structure, we assign only two neurons in the latent space. This choice serves a specific purpose - it is not necessarily because a 2D latent space is the optimal representation for the data, but rather because it provides an effective way to visualize the compressed data points in a two-dimensional space.

A 2D latent space allows the data points to be plotted in a 2D plane, making it easier to visualize how the autoencoder organizes and compresses the input data.
Instead of using all 10 digits (0 - 9), using only a subset of digits (e.g., 1, 5, 6) makes the visualization simpler and clearer.
Plotting all 10 digits in a 2D latent space can result in overlapping clusters, making it difficult to interpret the latent space representation.

# Define Structure

# Encoder Structure
encoder = tf.keras.models.Sequential([
    tf.keras.layers.InputLayer(input_shape = (784,)),
    tf.keras.layers.Dense(units = 500, activation = 'relu'),
    tf.keras.layers.Dense(units = 300, activation = 'relu'),
    tf.keras.layers.Dense(units = 2, activation = None)
    ])

# Decoder Structure
decoder = tf.keras.models.Sequential([
    tf.keras.layers.InputLayer(input_shape = (2,)),
    tf.keras.layers.Dense(units = 300, activation = 'relu'),
    tf.keras.layers.Dense(units = 500, activation = 'relu'),
    tf.keras.layers.Dense(units = 28*28, activation = None)
    ])

# Autoencoder = Encoder + Decoder
autoencoder = tf.keras.models.Sequential([encoder, decoder])

3.2. Define Loss and Optimizer¶

Loss

Squared loss

$$ \frac{1}{m}\sum_{i=1}^{m} (t_{i} - y_{i})^2 $$

Optimizer

AdamOptimizer: the most popular optimizer

autoencoder.compile(optimizer = tf.keras.optimizers.Adam(0.001),
                    loss = 'mean_squared_error',
                    metrics = ['mse'])

# Train Model & Evaluate Test Data

training = autoencoder.fit(train_imgs, train_imgs, batch_size = 50, epochs = 10)

Epoch 1/10
362/362 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - loss: 0.0441 - mse: 0.0441
Epoch 2/10
362/362 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 0.0303 - mse: 0.0303
Epoch 3/10
362/362 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - loss: 0.0289 - mse: 0.0289
Epoch 4/10
362/362 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 0.0281 - mse: 0.0281
Epoch 5/10
362/362 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 0.0275 - mse: 0.0275
Epoch 6/10
362/362 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - loss: 0.0268 - mse: 0.0268
Epoch 7/10
362/362 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - loss: 0.0264 - mse: 0.0264
Epoch 8/10
362/362 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 0.0263 - mse: 0.0263
Epoch 9/10
362/362 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 0.0259 - mse: 0.0259
Epoch 10/10
362/362 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 0.0258 - mse: 0.0258

3.3. Test or Evaluate¶

Test reconstruction performance of the autoencoder

test_scores = autoencoder.evaluate(test_imgs, test_imgs, verbose = 0)

print('Test loss: {}'.format(test_scores[0]))
print('Mean Squared Error: {} %'.format(test_scores[1]*100))

Test loss: 0.026385098695755005
Mean Squared Error: 2.6385098695755005 %

# Visualize Evaluation on Test Data

rand_idx = np.random.randint(1, test_imgs.shape[0])
# rand_idx = 6

test_img = test_imgs[rand_idx]
reconst_img = autoencoder.predict(test_img.reshape(1, 28*28), verbose = 0)

plt.figure(figsize = (8, 4))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28,28), 'gray')
plt.title('Input Image', fontsize = 12)

plt.xticks([])
plt.yticks([])
plt.subplot(1,2,2)
plt.imshow(reconst_img.reshape(28,28), 'gray')
plt.title('Reconstructed Image', fontsize = 12)
plt.xticks([])
plt.yticks([])

plt.show()

In this autoencoder for MNIST, it is remarkable that a latent space with only two neurons can capture sufficient information to reconstruct an MNIST image, originally represented in a 784-dimensional space.

4. Latent Space¶

The latent space refers to a lower-dimensional, abstract feature space where input data is encoded by a machine learning model, such as an autoencoder. In this space, the essential characteristics of the input data are represented in a compressed form. The term "latent" implies that the features in this space are hidden or learned representations that capture underlying patterns, rather than explicit input features.

Let’s examine how the compressed features of the digits 1, 5, and 6 are distributed in the learned latent space.
To see the distribution of latent variables, we make a projection of 784-dimensional image space onto 2-dimensional latent space

idx = np.random.randint(0, len(test_labels), 500)
test_x, test_y = test_imgs[idx], test_labels[idx]

test_latent = encoder.predict(test_x, verbose = 0)

plt.figure(figsize = (6, 6))
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.title('Latent Space', fontsize = 12)
plt.xlabel('$Z_1$', fontsize = 12)
plt.ylabel('$Z_2$', fontsize = 12)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.show()

From the results, we can conclude that clustering or classification can be performed in the 2D latent space, rather than in the original 784-dimensional input space. In many practical applications, dimensionality reduction is performed before applying further machine learning algorithms or analyses. This preprocessing step helps to address issues related to high-dimensional data and often improves the performance of subsequent tasks such as clustering, classification, or anomaly detection.

5. Data Generation¶

While autoencoders are primarily designed for representation learning and reconstruction, they can also be used for data generation by propagating values through the decoder. The process involves selecting a latent vector (a point in the latent space) and feeding it into the decoder to reconstruct an output that resembles the original data distribution.

How Data Generation Works in an Autoencoder

(1) Latent Space Exploration:

After training, the latent space encodes the structure of the input data. Each point in this space represents compressed features of a particular input (e.g., digits in the MNIST dataset).
By selecting specific points in the latent space, we can generate corresponding outputs through the decoder.

(2) Random Latent Vector Selection:

Instead of using actual encoded inputs, we can randomly sample latent vectors within a meaningful range of the latent space (typically around the learned distribution).
These sampled vectors are fed into the decoder to produce new, generative outputs that resemble the patterns learned during training.

(3) Decoding Process:

The decoder interprets the randomly selected latent vector as a set of features and reconstructs an image or data point that corresponds to these features.
If the latent vector lies near a cluster representing digit "6" in the MNIST dataset, for example, the output may resemble a handwritten "6".

Key Characteristics of Generative Data in Autoencoders

(1) Interpolation:

By selecting latent values between two points (e.g., between representations of digit "1" and digit "5"), the decoder can generate data that smoothly transitions between these classes, creating hybrid or intermediate samples.

(2) Reconstruction vs. Generation:

In reconstruction, the encoder compresses the input, and the decoder reconstructs the same input.
In data generation, the input is replaced by a synthetic latent vector, and the decoder "imagines" a corresponding output based on the patterns learned during training.

new_data = np.array([[-1, 6]])

fake_image = decoder.predict(new_data, verbose = 0)

plt.figure(figsize = (9, 4))
plt.subplot(1,2,1)
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.scatter(new_data[:,0], new_data[:,1], c = 'k', marker = '*', s = 100, label = 'new data')
plt.title('Latent Space', fontsize = 10)
plt.xlabel('$Z_1$', fontsize = 10)
plt.ylabel('$Z_2$', fontsize = 10)
plt.legend(loc = 2)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(fake_image.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 10)
plt.xticks([])
plt.yticks([])
plt.show()

6. More in Autoencoder¶

Types of Autoencoders

(1) Vanilla Autoencoder:

The simplest form of autoencoder, consisting of a symmetric encoder and decoder.

(2) Denoising Autoencoder (DAE):

Trained to reconstruct clean inputs from corrupted (noisy) inputs, making it robust to noise.

(3) Sparse Autoencoder:

Introduces a sparsity constraint in the latent representation to enforce that only a small number of neurons are active, resulting in more interpretable encodings.

(4) Variational Autoencoder (VAE):

A probabilistic autoencoder that learns distributions in the latent space rather than fixed values, allowing it to generate new data points by sampling from these distributions.

(5) Convolutional Autoencoder (CAE):

Uses convolutional layers instead of fully connected layers, making it well-suited for image-related tasks by preserving spatial hierarchies.

Key Features of Autoencoders

(1) Unsupervised Learning:

Autoencoders learn patterns in data without labeled outputs, making them ideal for unsupervised representation learning.

(2) Compression and Reconstruction:

The main objective of an autoencoder is to compress data into a smaller latent representation and reconstruct the original input as accurately as possible.

(3) Non-linear Mappings:

Unlike PCA, which assumes linear transformations, autoencoders can learn non-linear mappings, making them more powerful for complex datasets.

Applications of Autoencoders

(1) Denoising:

Autoencoders can be trained to remove noise from images or signals by reconstructing clean outputs from noisy inputs.

(2) Dimensionality Reduction:

Similar to Principal Component Analysis (PCA), autoencoders can reduce the dimensionality of data while preserving meaningful features.

(3) Anomaly Detection:

Autoencoders can detect anomalies by failing to reconstruct inputs that are significantly different from the training data. This is commonly used in fraud detection, industrial defect detection, and cybersecurity.

(4) Image Generation:

Advanced versions of autoencoders, such as variational autoencoders (VAEs), can generate new data samples by sampling from the latent space.

(5) Feature Extraction:

The encoder's latent space can be used as a feature extractor for downstream tasks such as classification and clustering.

Autoencoders are powerful neural networks used for unsupervised representation learning, with applications ranging from dimensionality reduction to anomaly detection and generative modeling. By compressing data into a latent space and reconstructing it, autoencoders learn to capture the most meaningful features of the data. While simple autoencoders are effective for many tasks, advanced variants such as VAEs and convolutional autoencoders offer improved performance for more complex data, making autoencoders an essential tool in the deep learning toolkit.

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')