Autoencoder

Table of Contents

from IPython.display import YouTubeVideo
YouTubeVideo('ul27ZUdYaBY', width = "560", height = "315")

1. Unsupervised Learning¶

Unsupervised learning is a type of machine learning where a model learns patterns, structures, or relationships in the data without labeled outputs. Unlike supervised learning, which uses input-output pairs for training, unsupervised learning works with unlabeled data, meaning the algorithm must discover inherent features or groupings within the dataset.

Definition

Unsupervised learning refers to most attempts to extract information from a distribution that do not require human labor to annotate example
Main task is to find the 'best' representation of the data

Dimension Reduction

Attempt to compress as much information as possible in a smaller representation
Preserve as much information as possible while obeying some constraint aimed at keeping the representation simpler

2. Autoencoders¶

An autoencoder is a type of neural network used for unsupervised learning that aims to learn efficient, compressed representations of input data. It consists of two main components: an encoder and a decoder. The encoder compresses the input into a lower-dimensional representation (called the latent space or bottleneck), while the decoder reconstructs the input from this compressed representation.

The objective of an autoencoder is to minimize the reconstruction error—the difference between the original input and its reconstruction.

Definition

An autoencoder is a neural network that is trained to attempt to copy its input to its output
The network consists of two parts: an encoder and a decoder that produce a reconstruction

Encoder and Decoder

Encoder function : $z = f(x)$
Decoder function : $x = g(z)$
We learn to set $g\left(f(x)\right) = x$

The top network in the below figure shows a trivial autoencoder where the input data is directly passed to the output without any significant compression or transformation. This structure may be able to perfectly reconstruct the input data, but it fails to learn any meaningful abstraction or representation. On the other hand, the bottom network illustrates a typical autoencoder with a squeezed bottleneck, which compresses the input into a lower-dimensional latent space before reconstructing it. This narrow bottleneck structure offers several key benefits:

The top structure, though capable of reconstructing the input, does not learn useful or general features. It essentially passes information without processing or summarizing it.
In contrast, the bottom structure is meaningful because the bottleneck introduces a constraint that forces the network to learn relevant features instead of copying the input.

Structure of an Autoencoder

Encoder (Compression Path):
- The encoder maps the input $x$ to a lower-dimensional latent representation $z$.
- This transformation can be represented as:
  
  $$ z = f(x) $$
- The encoder typically consists of fully connected (dense) or convolutional layers, followed by non-linear activations (e.g., ReLU, sigmoid).

Latent Space (Bottleneck):
- The bottleneck layer $z$ represents the compressed encoding of the input.
- The size of this layer controls how much information is retained from the input.

Decoder (Reconstruction Path):
- The decoder maps the latent representation $z$ back to the original input space:
  
  $$ \hat{x} = g(z) $$
- The decoder reconstructs the input $x$ as closely as possible to the original input.

Loss Function:
- The loss function measures the reconstruction error, typically using mean squared error (MSE):
  
  $$ \mathcal{L}(x, \hat{x}) = \|x - \hat{x}\|^2 $$

Key Features of Autoencoders

The squeezed bottleneck in an autoencoder acts as a filter that retains essential features while discarding irrelevant or redundant information. It enhances the model's ability to learn compact, robust representations of the data and improves its generalization ability, making it effective for tasks such as dimensionality reduction, denoising, anomaly detection, and representation learning.

(1) Dimensionality Reduction

The bottleneck layer forces the network to compress the input data into a lower-dimensional representation.
This reduction removes redundant and non-essential information, retaining only the most important features.
Example: Instead of storing all pixel values of an image, the autoencoder learns a compact representation (e.g., shape, edges, or texture patterns) to reconstruct the image.

(2) Feature Extraction

By limiting the size of the bottleneck, the autoencoder learns meaningful, abstract features rather than memorizing the input.
The compressed features in the latent space can be used for classification, clustering, or visualization tasks, similar to principal components in PCA but in a non-linear manner.

(3) Preventing Overfitting

A wide bottleneck can allow the autoencoder to simply copy the input data, leading to poor generalization.
A squeezed bottleneck enforces a constraint that forces the network to generalize the underlying structure of the data rather than learning a direct mapping of the inputs.

(4) Noise Reduction (Denoising Autoencoders)

In tasks like denoising and anomaly detection, the bottleneck prevents the network from reconstructing noise, as the network focuses only on core patterns.

(5) Efficient Encoding for Data Compression

Autoencoders are often used for data compression by representing the input data with fewer bits (from high-dimensional space to low-dimensional space).
The "squeezing" helps reduce data storage needs while maintaining the ability to reconstruct the original input with minimal loss.

3. Autoencoder with TensorFlow¶

MNIST example
Use only (1, 5, 6) digits to visualize in 2-D

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline

np.random.seed(42)

# Load Data

mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()
train_x, test_x = train_x.reshape(-1, 784)/255.0, test_x.reshape(-1, 784)/255.0

# Use only 1,5,6 digits to visualize latent sapce

train_idx1 = np.array(np.where(train_y == 1))
train_idx5 = np.array(np.where(train_y == 5))
train_idx6 = np.array(np.where(train_y == 6))
train_idx = np.sort(np.concatenate((train_idx1, train_idx5, train_idx6), axis = None))

test_idx1 = np.array(np.where(test_y == 1))
test_idx5 = np.array(np.where(test_y == 5))
test_idx6 = np.array(np.where(test_y == 6))
test_idx = np.sort(np.concatenate((test_idx1, test_idx5, test_idx6), axis = None))

train_imgs = train_x[train_idx]
train_labels = train_y[train_idx]
test_imgs = test_x[test_idx]
test_labels = test_y[test_idx]

n_train = train_imgs.shape[0]
n_test = test_imgs.shape[0]

print ("The number of training images : {}, shape : {}".format(n_train, train_imgs.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_imgs.shape))

The number of training images : 18081, shape : (18081, 784)
The number of testing images : 2985, shape : (2985, 784)

3.1. Define a Structure of an Autoencoder¶

Input shape and latent variable shape
Encoder shape
Decoder shape

Note on 2D Latent Space and Digits 1, 5, and 6 from MNIST

In this autoencoder structure, we assign only two neurons in the latent space. This choice serves a specific purpose - it is not necessarily because a 2D latent space is the optimal representation for the data, but rather because it provides an effective way to visualize the compressed data points in a two-dimensional space.

A 2D latent space allows the data points to be plotted in a 2D plane, making it easier to visualize how the autoencoder organizes and compresses the input data.
Instead of using all 10 digits (0 - 9), using only a subset of digits (e.g., 1, 5, 6) makes the visualization simpler and clearer.
Plotting all 10 digits in a 2D latent space can result in overlapping clusters, making it difficult to interpret the latent space representation.

# Define Structure

# Encoder Structure
encoder = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape = (784,)),
    tf.keras.layers.Dense(units = 500, activation = 'relu'),
    tf.keras.layers.Dense(units = 300, activation = 'relu'),
    tf.keras.layers.Dense(units = 2, activation = None)
    ])

# Decoder Structure
decoder = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape = (2,)),
    tf.keras.layers.Dense(units = 300, activation = 'relu'),
    tf.keras.layers.Dense(units = 500, activation = 'relu'),
    tf.keras.layers.Dense(units = 28*28, activation = None)
    ])

# Autoencoder = Encoder + Decoder
autoencoder = tf.keras.models.Sequential([encoder, decoder])

3.2. Define Loss and Optimizer¶

Loss

Squared loss

$$ \frac{1}{m}\sum_{i=1}^{m} (t_{i} - y_{i})^2 $$

Optimizer

AdamOptimizer: the most popular optimizer

autoencoder.compile(optimizer = tf.keras.optimizers.Adam(0.001),
                    loss = 'mean_squared_error',
                    metrics = ['mse'])

# Train Model & Evaluate Test Data

training = autoencoder.fit(train_imgs, train_imgs, batch_size = 50, epochs = 10, verbose = 0)

3.3. Test or Evaluate¶

Test reconstruction performance of the autoencoder

test_scores = autoencoder.evaluate(test_imgs, test_imgs, verbose = 0)

print('Test loss: {}'.format(test_scores[0]))
print('Mean Squared Error: {} %'.format(test_scores[1]*100))

Test loss: 0.026258012279868126
Mean Squared Error: 2.6258012279868126 %

# Visualize Evaluation on Test Data

rand_idx = np.random.randint(1, test_imgs.shape[0])
# rand_idx = 6

test_img = test_imgs[rand_idx]
reconst_img = autoencoder.predict(test_img.reshape(1, 28*28), verbose = 0)

plt.figure(figsize = (6, 4))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28,28), 'gray')
plt.title('Input Image', fontsize = 12)

plt.xticks([])
plt.yticks([])
plt.subplot(1,2,2)
plt.imshow(reconst_img.reshape(28,28), 'gray')
plt.title('Reconstructed Image', fontsize = 12)
plt.xticks([])
plt.yticks([])

plt.show()

In this autoencoder for MNIST, it is remarkable that a latent space with only two neurons can capture sufficient information to reconstruct an MNIST image, originally represented in a 784-dimensional space.

4. Latent Space¶

The latent space refers to a lower-dimensional, abstract feature space where input data is encoded by a machine learning model, such as an autoencoder. In this space, the essential characteristics of the input data are represented in a compressed form. The term "latent" implies that the features in this space are hidden or learned representations that capture underlying patterns, rather than explicit input features.

Let’s examine how the compressed features of the digits 1, 5, and 6 are distributed in the learned latent space.
To see the distribution of latent variables, we make a projection of 784-dimensional image space onto 2-dimensional latent space

idx = np.random.randint(0, len(test_labels), 500)
test_x, test_y = test_imgs[idx], test_labels[idx]
test_x = np.array(test_x)

test_latent = encoder.predict(test_x, verbose = 0)

plt.figure(figsize = (5, 5))
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.title('Latent Space', fontsize = 12)
plt.xlabel('$Z_1$', fontsize = 12)
plt.ylabel('$Z_2$', fontsize = 12)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.show()

From the results, we can conclude that clustering or classification can be performed in the 2D latent space, rather than in the original 784-dimensional input space. In many practical applications, dimensionality reduction is performed before applying further machine learning algorithms or analyses. This preprocessing step helps to address issues related to high-dimensional data and often improves the performance of subsequent tasks such as clustering, classification, or anomaly detection.

Note:

The latent space may change if the model is re-trained.
Each latent variable (or dimension) does not have an explicit physical meaning.

5. Data Generation¶

While autoencoders are primarily designed for representation learning and reconstruction, they can also be used for data generation by propagating values through the decoder. The process involves selecting a latent vector (a point in the latent space) and feeding it into the decoder to reconstruct an output that resembles the original data distribution.

How Data Generation Works in an Autoencoder

(1) Latent Space Exploration:

After training, the latent space encodes the structure of the input data. Each point in this space represents compressed features of a particular input (e.g., digits in the MNIST dataset).
By selecting specific points in the latent space, we can generate corresponding outputs through the decoder.

(2) Random Latent Vector Selection:

Instead of using actual encoded inputs, we can randomly sample latent vectors within a meaningful range of the latent space (typically around the learned distribution).
These sampled vectors are fed into the decoder to produce new, generative outputs that resemble the patterns learned during training.

(3) Decoding Process:

The decoder interprets the randomly selected latent vector as a set of features and reconstructs an image or data point that corresponds to these features.
If the latent vector lies near a cluster representing digit "6" in the MNIST dataset, for example, the output may resemble a handwritten "6".

Key Characteristics of Generative Data in Autoencoders

(1) Reconstruction vs. Generation:

In reconstruction, the encoder compresses the input, and the decoder reconstructs the same input.
In data generation, the input is replaced by a synthetic latent vector, and the decoder "imagines" a corresponding output based on the patterns learned during training.

(2) Interpolation:

By selecting latent values between two points (e.g., between representations of digit "1" and digit "5"), the decoder can generate data that smoothly transitions between these classes, creating hybrid or intermediate samples.

new_data = np.array([[-3, -5]])

fake_image = decoder.predict(new_data, verbose = 0)

plt.figure(figsize = (9, 4))
plt.subplot(1,2,1)
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.scatter(new_data[:,0], new_data[:,1], c = 'k', marker = '*', s = 100, label = 'new data')
plt.title('Latent Space', fontsize = 10)
plt.xlabel('$Z_1$', fontsize = 10)
plt.ylabel('$Z_2$', fontsize = 10)
plt.legend(loc = 2)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(fake_image.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 10)
plt.xticks([])
plt.yticks([])
plt.show()

5.1. Walk in the Latent Space¶

A latent space is a compressed, lower-dimensional space that a model uses to represent high-dimensional data. Each point in this space corresponds to some meaningful representation of an input.

A "walk in the latent space" refers to the process of moving smoothly through this space - typically by interpolating between latent vectors - and observing how the outputs change. This is like taking a walk through the model's internal "imagination."

new_data_1 = np.array([[0, 3]])
new_data_2 = np.array([[-3, -1]])
c = 0.5*(new_data_1 + new_data_2)

fake_image_1 = decoder.predict(new_data_1, verbose = 0)
fake_image_2 = decoder.predict(new_data_2, verbose = 0)

plt.figure(figsize = (6, 4))
plt.scatter(test_latent[test_y == 1,0], test_latent[test_y == 1,1], label = '1')
plt.scatter(test_latent[test_y == 5,0], test_latent[test_y == 5,1], label = '5')
plt.scatter(test_latent[test_y == 6,0], test_latent[test_y == 6,1], label = '6')
plt.scatter(new_data_1[:,0], new_data_1[:,1], c = 'k', marker = '*', s = 100, label = 'new data')
plt.scatter(new_data_2[:,0], new_data_2[:,1], c = 'k', marker = '*', s = 100, label = 'new data')
plt.scatter(c[:,0], c[:,1], c = 'k', marker = 's', s = 100, label = 'new data')
plt.show()

plt.figure(figsize = (6, 4))
plt.subplot(1,2,1)
plt.imshow(fake_image_1.reshape(28,28), 'gray')
plt.title('Input Image 1', fontsize = 12)

plt.xticks([])
plt.yticks([])
plt.subplot(1,2,2)
plt.imshow(fake_image_2.reshape(28,28), 'gray')
plt.title('Input Image 2', fontsize = 12)
plt.xticks([])
plt.yticks([])

plt.show()

In this example, two points in the latent space appear to represent different tilt angles of a digit. This suggests that the latent space captures meaningful variations in the data, such as rotation. However, it is important to note again that these representations are not fixed - they will change if the model is re-trained.

fake_image_lat = decoder.predict(0.5*(new_data_1 + new_data_2), verbose = 0)

plt.figure(figsize = (6, 4))
plt.subplot(1,2,1)
plt.imshow(0.5*(fake_image_1 + fake_image_2).reshape(28,28), 'gray')
plt.title('Interpolation in original')
plt.xticks([])
plt.yticks([])

plt.subplot(1,2,2)
plt.imshow(fake_image_lat.reshape(28,28), 'gray')
plt.title('Interpolation in latent')
plt.xticks([])
plt.yticks([])

plt.show()

In the original input space, computing $ \frac{x_1 + x_2}{2} $ results in a simple pixel-wise average of two images, which often appears blurry and lacks interpretability. In contrast, reconstructing the image from the averaged latent vector $ \frac{z_1 + z_2}{2} $ produces a digit whose tilt angle lies roughly between those of $ z_1 $ and $ z_2 $. This demonstrates that interpolation in the latent space can yield semantically meaningful results, unlike direct interpolation in the raw input space.

Vector Arithmetic in Latent Space

Latent space representations learned by generative models can be directly manipulated through simple vector arithmetic to generate new, semantically meaningful outputs. This fascinating property enables intuitive and targeted image generation, such as modifying attributes like gender, facial expression, or accessories.

A major milestone in this area was the 2015 paper by Alec Radford et al. titled "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks." The authors introduced a stable architecture for training deep convolutional neural networks within the GAN framework, known as DCGAN. (While not an autoencoder, it shares a similar underlying philosophy as a generative model.)

In their study, the authors explored the structure of the latent space by training models on various datasets - most notably, a dataset of celebrity faces - and demonstrated that semantic transformations (e.g., adding glasses or changing gender) could be achieved via linear operations in the latent space.

The image above illustrates this concept. A latent vector representing a "man with glasses" is subtracted from a "man without glasses", isolating the direction in latent space corresponding to the "glasses" feature. When this difference is added to the latent vector of a "woman without glasses", the result is a new latent vector that, when decoded, generates an image of a "woman with glasses."

This arithmetic operation can be formally written as:

$$ z_{\text{man with glasses}} - z_{\text{man without glasses}} + z_{\text{woman without glasses}} = z_{\text{woman with glasses}} $$

This example highlights the fact that GANs and similar generative models learn structured and semantically meaningful latent spaces, where directions correspond to interpretable, high-level concepts. Such properties make these models especially powerful in applications such as generative design, facial attribute editing, style transfer, and interactive AI-based creative tools.

5.2. Generative AI in Image Modeling: From Autoencoders to Diffusion Models¶

Generative AI focuses on learning data distributions in order to generate new, realistic samples - such as images, audio, or text - that resemble those from the original dataset. In the field of image generation, several deep learning architectures have been developed, each marking a significant advancement in generative modeling:

Autoencoders
Variational Autoencoders (VAEs)
Generative Adversarial Networks (GANs)
Diffusion Models

Autoencoders are not probabilistic models and often produce outputs that lack sharpness or realism, as we have observed. However, they serve as an important foundational architecture for generative AI.

Generative image modeling has rapidly evolved - from simple reconstructions using autoencoders to high-fidelity and controllable image generation using diffusion models. Each step in this progression - from VAEs to GANs to diffusion models - has contributed to improvements in visual realism, diversity of outputs, and user-driven customization.

Today, tools such as DALL-E and Stable Diffusion have made these technologies widely accessible to researchers, creators, and the general public, enabling a broad range of creative and scientific applications.

6. Autoencoder for Pendulum Dynamics¶

In this example, we examine the dynamics of a simple pendulum governed by the nonlinear second-order differential equation:

$$\ddot{\theta} + \frac{g}{l} \sin(\theta) = 0$$

Suppose we have recorded a series of videos or image frames capturing the pendulum's motion. We aim to apply an autoencoder to learn a compact (low-dimensional) representation of the pendulum's state from these high-dimensional image inputs.

Data Description and Visualization

We generated our dataset using a simulated simple pendulum model. Each data sample is a grayscale image of size $128 \times 128$ pixels, depicting the pendulum's position.

The dataset captures the pendulum's motion during half of its oscillation cycle, from $-45^{\circ}$ to $+45^{\circ}$.
This motion was uniformly discretized into 500 time steps, providing evenly spaced temporal snapshots of the pendulum's dynamics.
Download the input and corresponding output data

These image sequences serve as input to the autoencoder, which learns a latent representation that preserves the essential features of the pendulum's physical behavior while reducing dimensionality.

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline

tf.random.set_seed(5)

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

train_images = np.load('/content/drive/MyDrive/ML/ML_data/pendulum_images.npy')
train_angles = np.load('/content/drive/MyDrive/ML/ML_data/pendulum_angles.npy')

Visualize a single input image from the dataset

print(train_images.shape, '\n')

img = train_images[350]

plt.figure(figsize = (4, 4))
plt.imshow(img.reshape(128,128), 'gray')
plt.xticks([]), plt.yticks([])
plt.show()

(500, 128, 128)

Visualize the sequence of all 500 frame

# Let's see the pendulum motion in time

from matplotlib.animation import FuncAnimation
import time
from IPython import display

fig = plt.figure(figsize = (4, 4))
ax1 = fig.add_subplot(1, 1, 1)

def animate(i):
  ax1.clear()
  ax1.imshow(train_images[i].reshape(128, 128), 'gray')
  ax1.axis('off')

ani = FuncAnimation(fig, animate, frames = int(500/1), interval = 5)
video = ani.to_html5_video()
html = display.HTML(video)
display.display(html)
plt.close()

6.1. Autoencoder¶

We construct an autoencoder to learn a 1D latent representation of the pendulum motion from image data.

The input images are first flattened and passed through a fully connected (ANN-based) autoencoder.
Based on our understanding of pendulum dynamics, the motion can be described by a single generalized coordinate: the angle $\theta$.
Therefore, we design the latent space with only one neuron, encouraging the autoencoder to extract a compact representation closely related to $\theta$.

Input Reshaping (or Flattening) for an Autoencoder

train_x = train_images.reshape(-1, 128*128) / 255.0
train_angles = train_angles.reshape(-1, 1)

print(train_x.shape)
print(train_angles.shape)

(500, 16384)
(500, 1)

Autoencoder with a 1 Dimensional Latent Space

# Define Structure

# Encoder Structure
encoder = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape = (16384,)),
    tf.keras.layers.Dense(units = 1024, activation = 'relu'),
    tf.keras.layers.Dense(units = 256, activation = 'relu'),
    tf.keras.layers.Dense(units = 64, activation = 'relu'),
    tf.keras.layers.Dense(units = 1, activation = None)
    ])

# Decoder Structure
decoder = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape = (1,)),
    tf.keras.layers.Dense(units = 64, activation = 'relu'),
    tf.keras.layers.Dense(units = 256, activation = 'relu'),
    tf.keras.layers.Dense(units = 1024, activation = 'relu'),
    tf.keras.layers.Dense(units = 16384, activation = None)
    ])

# Autoencoder = Encoder + Decoder
autoencoder = tf.keras.models.Sequential([encoder, decoder])

autoencoder.compile(optimizer = tf.keras.optimizers.Adam(5 * 1e-4),
                    loss = 'mse',
                    metrics = ['mse'])

training = autoencoder.fit(train_x, train_x, batch_size = 64, epochs = 200, verbose = 0)

Reconstruction of Input Images along with their Associated Latent Representations

latent_x = encoder.predict(train_x, verbose = 0)
reconst_x = decoder.predict(latent_x, verbose = 0)

# Let's see the motion of pendulum and latent space in time

from matplotlib.animation import FuncAnimation
import time
from IPython import display

fig = plt.figure(figsize = (8, 3))

ax1 = fig.add_subplot(1, 3, 1)
ax2 = fig.add_subplot(1, 3, 2)
ax3 = fig.add_subplot(1, 3, 3)

def animate(i):
  ax1.clear()
  ax1.imshow(train_x[i].reshape(128, 128), 'gray')
  ax1.set_title('Input')
  ax1.axis('off')
  ax2.clear()
  ax2.imshow(reconst_x[i].reshape(128, 128), 'gray')
  ax2.set_title('Reconstructed')
  ax2.axis('off')
  ax3.clear()
  ax3.scatter(train_angles, latent_x, alpha = 0.1, s = 20)
  ax3.scatter(train_angles[i], latent_x[i], color = 'r', s = 50)
  ax3.set_ylabel('1D latent value')
  ax3.set_xlabel(r"True angle, $\theta$")
  ax3.set_xticks([])
  ax3.set_yticks([])

ani = FuncAnimation(fig, animate, frames = int(500/1), interval = 5)
video = ani.to_html5_video()
html = display.HTML(video)
display.display(html)
plt.close()

Observe the Nonlinear Mapping between $\theta$ and Latent Values

plt.figure(figsize = (4, 4))
plt.scatter(train_angles*180/np.pi, latent_x, alpha = 0.1, s = 20)
plt.xlabel(r"True angle, $\theta$")
plt.ylabel("Latent Variable")
plt.show()

6.2. Walk in the Latent Space¶

new_data_1 = np.array([[-10]])
new_data_2 = np.array([[5]])

fake_image_1 = decoder.predict(new_data_1, verbose = 0)
fake_image_2 = decoder.predict(new_data_2, verbose = 0)

plt.figure(figsize = (6, 4))
plt.subplot(1,2,1)
plt.imshow(fake_image_1.reshape(128, 128), 'gray')
plt.title('Input Image 1')
plt.xticks([]), plt.yticks([])
plt.subplot(1,2,2)
plt.imshow(fake_image_2.reshape(128, 128), 'gray')
plt.title('Input Image 2')
plt.xticks([]), plt.yticks([])
plt.show()

fake_image_lat = decoder.predict(0.5*(new_data_1 + new_data_2), verbose = 0)

plt.figure(figsize = (6, 4))
plt.subplot(1,2,1)
plt.imshow(0.5*(fake_image_1 + fake_image_2).reshape(128, 128), 'gray')
plt.title('Interpolation in Original')
plt.xticks([]), plt.yticks([])
plt.subplot(1,2,2)
plt.imshow(fake_image_lat.reshape(128, 128), 'gray')
plt.title('Interpolation in Latent')
plt.xticks([]), plt.yticks([])
plt.show()

The result demonstrates a walk in the latent space learned by the autoencoder for the pendulum system, emphasizing the significance of the learned manifold in capturing the essential structure of the original high-dimensional data.

7. Revisit the Problem: Flow Around a Circular Cylinder¶

We previously studied fluid flow past a circular cylinder at low Reynolds number in the context of dimension reduction. In that example, we applied SVD or POD as linear techniques for reducing dimensionality.

Now that we've introduced the autoencoder - a nonlinear, neural network-based approach to dimension reduction - let's apply an autoencoder to the same problem and compare the results.

This example is adapted from the textbook "Data-Driven Science and Engineering" by Steven L. Brunton and J. Nathan Kutz.

Data Description and Visualization

The dataset provided contains flow field data for fluid flow past a circular cylinder at a Reynolds number of Re = 100. The data file, CYLINDER_ALL.mat, includes 151 snapshots of velocity components and vorticity fields:

UALL: Horizontal velocity field (u-velocity)
VALL: Vertical velocity field (v-velocity)
VORTALL: Vorticity field

The flow is studied at a Reynolds number of Re = 100, a regime where vortex shedding occurs periodically and can be captured effectively using dimesion reduction.

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive

from scipy import io

flows = io.loadmat('/content/drive/MyDrive/ML/ML_data/CYLINDER_ALL.mat')

flows_mat_u = flows['UALL']
flows_mat_v = flows['VALL']
flows_mat_vort = flows['VORTALL']

flows_mat_vort_normalized = (flows_mat_vort - flows_mat_vort.min()) / (flows_mat_vort.max() - flows_mat_vort.min())
flows_mat_vort_normalized_mean = flows_mat_vort_normalized.mean(axis = 1)
flows_mat_vort_normalized_centered = flows_mat_vort_normalized - flows_mat_vort_normalized_mean[:, None]

flows_mat_vort_normalized_centered = np.array(flows_mat_vort_normalized_centered)
print(flows_mat_vort_normalized_centered.shape)

nx = 449
ny = 199

(89351, 151)

7.1. Autoencoder¶

Input Reshaping (or Flattening) for an Autoencoder

# focus on vorticity field only

train_x = flows_mat_vort_normalized_centered.T.reshape(-1, nx*ny, 1)

print(train_x.shape)

(151, 89351, 1)

Autoencoder with a 2 Dimensional Latent Space

import tensorflow as tf

# Encoder Structure
encoder = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape = (89351,)),
    tf.keras.layers.Dense(units = 256, activation = 'relu'),
    tf.keras.layers.Dense(units = 64, activation = 'relu'),
    tf.keras.layers.Dense(units = 2, activation = None)
    ])

# Decoder Structure
decoder = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape = (2,)),
    tf.keras.layers.Dense(units = 64, activation = 'relu'),
    tf.keras.layers.Dense(units = 256, activation = 'relu'),
    tf.keras.layers.Dense(units = 449*199, activation = None)
    ])

# Autoencoder = Encoder + Decoder
autoencoder = tf.keras.models.Sequential([encoder, decoder])

autoencoder.compile(optimizer = tf.keras.optimizers.Adam(),
                    loss = 'mean_squared_error',
                    metrics = ['mse'])

# Train Model
training = autoencoder.fit(train_x, train_x, epochs = 500, verbose = 0)

Reconstructing Inputs and Visualizing Them in 2D Latent Space

reconst_x = autoencoder.predict(train_x, verbose = 0)

latent = encoder.predict(train_x, verbose = 0)

# Let's see the flow and latent space in time

from matplotlib.animation import FuncAnimation
import time
from IPython import display

fig = plt.figure(figsize = (6, 4))

ax1 = fig.add_subplot(1, 3, 1)
ax2 = fig.add_subplot(1, 3, 2)
ax3 = fig.add_subplot(1, 3, 3)

def animate(i):
  ax1.clear()
  ax1.imshow(train_x[i,:].reshape(nx, ny) + flows_mat_vort_normalized_mean.reshape(nx, ny), 'gray')
  ax1.set_title('Ground Truth')
  ax1.axis('off')
  ax2.clear()
  ax2.imshow(reconst_x[i,:].reshape(nx, ny) + flows_mat_vort_normalized_mean.reshape(nx, ny), 'gray')
  ax2.set_title('Reconstructed')
  ax2.axis('off')
  ax3.clear()
  ax3.scatter(latent[:,0], latent[:,1], alpha = 0.3)
  ax3.scatter(latent[i,0], latent[i,1], color = 'r', s = 50)
  ax3.set_title('Latent Space')
  ax3.set_xticks([])
  ax3.set_yticks([])

ani = FuncAnimation(fig, animate, frames = int(150/1), interval = 20)
video = ani.to_html5_video()
html = display.HTML(video)
display.display(html)
plt.close()

Remarkably, only two values are sufficient to represent the vorticity field, which initially appears to be more complex.

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')