Autoencoder
Table of Contents
from IPython.display import YouTubeVideo
YouTubeVideo('ul27ZUdYaBY', width = "560", height = "315")
Unsupervised learning is a type of machine learning where a model learns patterns, structures, or relationships in the data without labeled outputs. Unlike supervised learning, which uses input-output pairs for training, unsupervised learning works with unlabeled data, meaning the algorithm must discover inherent features or groupings within the dataset.
Definition
Dimension Reduction
An autoencoder is a type of neural network used for unsupervised learning that aims to learn efficient, compressed representations of input data. It consists of two main components: an encoder and a decoder. The encoder compresses the input into a lower-dimensional representation (called the latent space or bottleneck), while the decoder reconstructs the input from this compressed representation.
The objective of an autoencoder is to minimize the reconstruction error—the difference between the original input and its reconstruction.
Definition
Encoder and Decoder
The top network in the below figure shows a trivial autoencoder where the input data is directly passed to the output without any significant compression or transformation. This structure may be able to perfectly reconstruct the input data, but it fails to learn any meaningful abstraction or representation. On the other hand, the bottom network illustrates a typical autoencoder with a squeezed bottleneck, which compresses the input into a lower-dimensional latent space before reconstructing it. This narrow bottleneck structure offers several key benefits:
The top structure, though capable of reconstructing the input, does not learn useful or general features. It essentially passes information without processing or summarizing it.
In contrast, the bottom structure is meaningful because the bottleneck introduces a constraint that forces the network to learn relevant features instead of copying the input.
Structure of an Autoencoder
Encoder (Compression Path):
$$
z = f(x)
$$
Latent Space (Bottleneck):
Decoder (Reconstruction Path):
$$
\hat{x} = g(z)
$$
Loss Function:
$$ \mathcal{L}(x, \hat{x}) = \|x - \hat{x}\|^2 $$
Key Features of Autoencoders
The squeezed bottleneck in an autoencoder acts as a filter that retains essential features while discarding irrelevant or redundant information. It enhances the model's ability to learn compact, robust representations of the data and improves its generalization ability, making it effective for tasks such as dimensionality reduction, denoising, anomaly detection, and representation learning.
(1) Dimensionality Reduction
The bottleneck layer forces the network to compress the input data into a lower-dimensional representation.
This reduction removes redundant and non-essential information, retaining only the most important features.
Example: Instead of storing all pixel values of an image, the autoencoder learns a compact representation (e.g., shape, edges, or texture patterns) to reconstruct the image.
(2) Feature Extraction
By limiting the size of the bottleneck, the autoencoder learns meaningful, abstract features rather than memorizing the input.
The compressed features in the latent space can be used for classification, clustering, or visualization tasks, similar to principal components in PCA but in a non-linear manner.
(3) Preventing Overfitting
A wide bottleneck can allow the autoencoder to simply copy the input data, leading to poor generalization.
A squeezed bottleneck enforces a constraint that forces the network to generalize the underlying structure of the data rather than learning a direct mapping of the inputs.
(4) Noise Reduction (Denoising Autoencoders)
(5) Efficient Encoding for Data Compression
Autoencoders are often used for data compression by representing the input data with fewer bits (from high-dimensional space to low-dimensional space).
The "squeezing" helps reduce data storage needs while maintaining the ability to reconstruct the original input with minimal loss.
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline
# Load Data
mnist = tf.keras.datasets.mnist
(train_x, train_y), (test_x, test_y) = mnist.load_data()
train_x, test_x = train_x.reshape(-1, 784)/255.0, test_x.reshape(-1, 784)/255.0
# Use only 1,5,6 digits to visualize latent sapce
train_idx1 = np.array(np.where(train_y == 1))
train_idx5 = np.array(np.where(train_y == 5))
train_idx6 = np.array(np.where(train_y == 6))
train_idx = np.sort(np.concatenate((train_idx1, train_idx5, train_idx6), axis = None))
test_idx1 = np.array(np.where(test_y == 1))
test_idx5 = np.array(np.where(test_y == 5))
test_idx6 = np.array(np.where(test_y == 6))
test_idx = np.sort(np.concatenate((test_idx1, test_idx5, test_idx6), axis = None))
train_imgs = train_x[train_idx]
train_labels = train_y[train_idx]
test_imgs = test_x[test_idx]
test_labels = test_y[test_idx]
n_train = train_imgs.shape[0]
n_test = test_imgs.shape[0]
print ("The number of training images : {}, shape : {}".format(n_train, train_imgs.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_imgs.shape))
Note on 2D Latent Space and Digits 1, 5, and 6 from MNIST
In this autoencoder structure, we assign only two neurons in the latent space. This choice serves a specific purpose - it is not necessarily because a 2D latent space is the optimal representation for the data, but rather because it provides an effective way to visualize the compressed data points in a two-dimensional space.
A 2D latent space allows the data points to be plotted in a 2D plane, making it easier to visualize how the autoencoder organizes and compresses the input data.
Instead of using all 10 digits (0 - 9), using only a subset of digits (e.g., 1, 5, 6) makes the visualization simpler and clearer.
Plotting all 10 digits in a 2D latent space can result in overlapping clusters, making it difficult to interpret the latent space representation.
# Define Structure
# Encoder Structure
encoder = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape = (784,)),
tf.keras.layers.Dense(units = 500, activation = 'relu'),
tf.keras.layers.Dense(units = 300, activation = 'relu'),
tf.keras.layers.Dense(units = 2, activation = None)
])
# Decoder Structure
decoder = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape = (2,)),
tf.keras.layers.Dense(units = 300, activation = 'relu'),
tf.keras.layers.Dense(units = 500, activation = 'relu'),
tf.keras.layers.Dense(units = 28*28, activation = None)
])
# Autoencoder = Encoder + Decoder
autoencoder = tf.keras.models.Sequential([encoder, decoder])
Loss
Optimizer
autoencoder.compile(optimizer = tf.keras.optimizers.Adam(0.001),
loss = 'mean_squared_error',
metrics = ['mse'])
# Train Model & Evaluate Test Data
training = autoencoder.fit(train_imgs, train_imgs, batch_size = 50, epochs = 10)
test_scores = autoencoder.evaluate(test_imgs, test_imgs, verbose = 0)
print('Test loss: {}'.format(test_scores[0]))
print('Mean Squared Error: {} %'.format(test_scores[1]*100))
# Visualize Evaluation on Test Data
rand_idx = np.random.randint(1, test_imgs.shape[0])
# rand_idx = 6
test_img = test_imgs[rand_idx]
reconst_img = autoencoder.predict(test_img.reshape(1, 28*28), verbose = 0)
plt.figure(figsize = (8, 4))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28,28), 'gray')
plt.title('Input Image', fontsize = 12)
plt.xticks([])
plt.yticks([])
plt.subplot(1,2,2)
plt.imshow(reconst_img.reshape(28,28), 'gray')
plt.title('Reconstructed Image', fontsize = 12)
plt.xticks([])
plt.yticks([])
plt.show()
In this autoencoder for MNIST, it is remarkable that a latent space with only two neurons can capture sufficient information to reconstruct an MNIST image, originally represented in a 784-dimensional space.
The latent space refers to a lower-dimensional, abstract feature space where input data is encoded by a machine learning model, such as an autoencoder. In this space, the essential characteristics of the input data are represented in a compressed form. The term "latent" implies that the features in this space are hidden or learned representations that capture underlying patterns, rather than explicit input features.
Let’s examine how the compressed features of the digits 1, 5, and 6 are distributed in the learned latent space.
To see the distribution of latent variables, we make a projection of 784-dimensional image space onto 2-dimensional latent space
idx = np.random.randint(0, len(test_labels), 500)
test_x, test_y = test_imgs[idx], test_labels[idx]
test_latent = encoder.predict(test_x, verbose = 0)
plt.figure(figsize = (6, 6))
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.title('Latent Space', fontsize = 12)
plt.xlabel('$Z_1$', fontsize = 12)
plt.ylabel('$Z_2$', fontsize = 12)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.show()
From the results, we can conclude that clustering or classification can be performed in the 2D latent space, rather than in the original 784-dimensional input space. In many practical applications, dimensionality reduction is performed before applying further machine learning algorithms or analyses. This preprocessing step helps to address issues related to high-dimensional data and often improves the performance of subsequent tasks such as clustering, classification, or anomaly detection.
While autoencoders are primarily designed for representation learning and reconstruction, they can also be used for data generation by propagating values through the decoder. The process involves selecting a latent vector (a point in the latent space) and feeding it into the decoder to reconstruct an output that resembles the original data distribution.
How Data Generation Works in an Autoencoder
(1) Latent Space Exploration:
After training, the latent space encodes the structure of the input data. Each point in this space represents compressed features of a particular input (e.g., digits in the MNIST dataset).
By selecting specific points in the latent space, we can generate corresponding outputs through the decoder.
(2) Random Latent Vector Selection:
Instead of using actual encoded inputs, we can randomly sample latent vectors within a meaningful range of the latent space (typically around the learned distribution).
These sampled vectors are fed into the decoder to produce new, generative outputs that resemble the patterns learned during training.
(3) Decoding Process:
The decoder interprets the randomly selected latent vector as a set of features and reconstructs an image or data point that corresponds to these features.
If the latent vector lies near a cluster representing digit "6" in the MNIST dataset, for example, the output may resemble a handwritten "6".
Key Characteristics of Generative Data in Autoencoders
(1) Interpolation:
(2) Reconstruction vs. Generation:
In reconstruction, the encoder compresses the input, and the decoder reconstructs the same input.
In data generation, the input is replaced by a synthetic latent vector, and the decoder "imagines" a corresponding output based on the patterns learned during training.
new_data = np.array([[-1, 6]])
fake_image = decoder.predict(new_data, verbose = 0)
plt.figure(figsize = (9, 4))
plt.subplot(1,2,1)
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.scatter(new_data[:,0], new_data[:,1], c = 'k', marker = '*', s = 100, label = 'new data')
plt.title('Latent Space', fontsize = 10)
plt.xlabel('$Z_1$', fontsize = 10)
plt.ylabel('$Z_2$', fontsize = 10)
plt.legend(loc = 2)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(fake_image.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 10)
plt.xticks([])
plt.yticks([])
plt.show()
Types of Autoencoders
(1) Vanilla Autoencoder:
(2) Denoising Autoencoder (DAE):
(3) Sparse Autoencoder:
(4) Variational Autoencoder (VAE):
(5) Convolutional Autoencoder (CAE):
Key Features of Autoencoders
(1) Unsupervised Learning:
(2) Compression and Reconstruction:
(3) Non-linear Mappings:
Applications of Autoencoders
(1) Denoising:
(2) Dimensionality Reduction:
(3) Anomaly Detection:
(4) Image Generation:
(5) Feature Extraction:
Autoencoders are powerful neural networks used for unsupervised representation learning, with applications ranging from dimensionality reduction to anomaly detection and generative modeling. By compressing data into a latent space and reconstructing it, autoencoders learn to capture the most meaningful features of the data. While simple autoencoders are effective for many tasks, advanced variants such as VAEs and convolutional autoencoders offer improved performance for more complex data, making autoencoders an essential tool in the deep learning toolkit.
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')