KAIST 산학협동 공개강좌

인공지능과 설계: 해석 예측에서 설계 최적화까지

Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

Practice Aims & Objectives¶

Implement two dimension reduction methods: PCA and autoencoder
Perform data generation with autoencoder

1. Principal Component Analysis (PCA)¶

Motivation: Can we describe high-dimensional data in a "simpler" way?

$\quad \rightarrow$ Dimension reduction without losing too much information
$\quad \rightarrow$ Find a low-dimensional, yet useful representation of the data

Why dimensionality reduction?
- insights into the low-dimensinal structures in the data (visualization)
- Fewer dimensions ⇒ Less chances of overfitting ⇒ Better generalization
- Speeding up learning algorithms
  - Most algorithms scale badly with increasing data dimensionality
- Less storage requirements (data compression)
- Note: Dimensionality Reduction is different from Feature Selection
  - .. although the goals are kind of the same
- Dimensionality reduction is more like “Feature Extraction”
  - Constructing a small set of new features from the original features

Data $\rightarrow$ projection onto unit vector $\hat{u}_1$
- PCA is used when we want projections capturing maximum variance directions
- Principal Components (PC): directions of maximum variability in the data
- Roughly speaking, PCA does a change of axes that can represent the data in a succinct manner

How?
idea: highly correlated data contains redundant features

Each example $x$ has 2 features $\{x_1,x_2\}$
Consider ignoring the feature $x_2$ for each example
Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{x_1\}$
Are we losing much information by throwing away $x_2$ ?

Each example $x$ has 2 features $\{x_1,x_2\}$
Consider ignoring the feature $x_2$ for each example
Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{x_1\}$
Are we losing much information by throwing away $x_2$ ?
Yes, the data has substantial variance along both features (i.e. both axes)

Now consider a change of axes
Each example $x$ has 2 features $\{u_1,u_2\}$
Consider ignoring the feature $u_2$ for each example
Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{u_1\}$
Are we losing much information by throwing away $u_2$ ?
No. Most of the data spread is along $u_1$ (very little variance along $u_2$)

How?
1. Maximize variance (most separable)
2. Minimize the sum-of-squares (minimum squared error)

2. Autoencoders¶

Definition

An autoencoder is a neural network that is trained to attempt to copy its input to its output
The network consists of two parts: an encoder and a decoder that produce a reconstruction

Encoder and Decoder

Encoder function : $z = f(x)$
Decoder function : $x = g(z)$
We learn to set $g\left(f(x)\right) = x$

3. Dimension Reduction with PCA and Autoencoder¶

3.1 Data Description: MNIST¶

What is MNIST

From Wikipedia

The MNIST database (Mixed National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, NIST's complete dataset was too hard.

MNIST (Mixed National Institute of Standards and Technology database) database
- Handwritten digit database¶
$28 \times 28$ gray scaled image¶
Flattened matrix into a vector of $28 \times 28 = 784$

Objective: 784 → 2 Dimensions

3.2 Load MNIST Data¶

Use only (1, 5, 6) digits to visualize in 2-D

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline

# Load Data

mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()
train_x, test_x = train_x.reshape(-1, 784)/255.0, test_x.reshape(-1, 784)/255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step

# Use only 1,5,6 digits to visualize latent sapce

train_idx1 = np.array(np.where(train_y == 1))
train_idx5 = np.array(np.where(train_y == 5))
train_idx6 = np.array(np.where(train_y == 6))
train_idx = np.sort(np.concatenate((train_idx1, train_idx5, train_idx6), axis = None))

test_idx1 = np.array(np.where(test_y == 1))
test_idx5 = np.array(np.where(test_y == 5))
test_idx6 = np.array(np.where(test_y == 6))
test_idx = np.sort(np.concatenate((test_idx1, test_idx5, test_idx6), axis = None))

train_imgs = train_x[train_idx]
train_labels = train_y[train_idx]
test_imgs = test_x[test_idx]
test_labels = test_y[test_idx]

n_train = train_imgs.shape[0]
n_test = test_imgs.shape[0]

print ("The number of training images : {}, shape : {}".format(n_train, train_imgs.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_imgs.shape))

The number of training images : 18081, shape : (18081, 784)
The number of testing images : 2985, shape : (2985, 784)

3.3 PCA¶

from sklearn.decomposition import PCA

pca = PCA(n_components = 2)

test_pca = pca.fit_transform(test_imgs)

test_pca.shape

(2985, 2)

plt.figure(figsize = (6, 6))

plt.scatter(test_pca[test_labels == 1,0], test_pca[test_labels == 1,1], alpha = 0.4, label = '1')
plt.scatter(test_pca[test_labels == 5,0], test_pca[test_labels == 5,1], alpha = 0.4, label = '5')
plt.scatter(test_pca[test_labels == 6,0], test_pca[test_labels == 6,1], alpha = 0.4, label = '6')
plt.title('Latent Space', fontsize = 12)
plt.xlabel('$Z_1$', fontsize = 12)
plt.ylabel('$Z_2$', fontsize = 12)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.show()

3.4 Autoencoder: Dimension Reduction and Data Generation¶

Define a Structure of an Autoencoder

Input shape and latent variable shape
Encoder shape
Decoder shape

# Encoder Structure
encoder = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 500, activation = 'relu', input_dim = 784),
    tf.keras.layers.Dense(units = 300, activation = 'relu'),
    tf.keras.layers.Dense(units = 2, activation = None)
    ])

# Decoder Structure
decoder = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 300, activation = 'relu', input_shape = (2,)),
    tf.keras.layers.Dense(units = 500, activation = 'relu'),
    tf.keras.layers.Dense(units = 28*28, activation = None)
    ])

# Autoencoder = Encoder + Decoder
autoencoder = tf.keras.models.Sequential([encoder, decoder])

Define Loss and Optimizer

Loss: Squared loss $$ \frac{1}{m}\sum_{i=1}^{m} (t_{i} - y_{i})^2 $$
Optimizer: AdamOptimizer

autoencoder.compile(optimizer = tf.keras.optimizers.Adam(0.001),
                    loss = 'mean_squared_error',
                    metrics = ['mse'])

# Train Model & Evaluate Test Data

training = autoencoder.fit(train_imgs, train_imgs, batch_size = 50, epochs = 10)

Epoch 1/10
362/362 [==============================] - 8s 19ms/step - loss: 0.0371 - mse: 0.0371
Epoch 2/10
362/362 [==============================] - 8s 22ms/step - loss: 0.0303 - mse: 0.0303
Epoch 3/10
362/362 [==============================] - 10s 27ms/step - loss: 0.0290 - mse: 0.0290
Epoch 4/10
362/362 [==============================] - 7s 21ms/step - loss: 0.0280 - mse: 0.0280
Epoch 5/10
362/362 [==============================] - 14s 37ms/step - loss: 0.0274 - mse: 0.0274
Epoch 6/10
362/362 [==============================] - 13s 35ms/step - loss: 0.0270 - mse: 0.0270
Epoch 7/10
362/362 [==============================] - 12s 33ms/step - loss: 0.0266 - mse: 0.0266
Epoch 8/10
362/362 [==============================] - 12s 33ms/step - loss: 0.0263 - mse: 0.0263
Epoch 9/10
362/362 [==============================] - 11s 31ms/step - loss: 0.0261 - mse: 0.0261
Epoch 10/10
362/362 [==============================] - 12s 32ms/step - loss: 0.0259 - mse: 0.0259

Test and Evaluate

Test reconstruction performance of the autoencoder

test_scores = autoencoder.evaluate(test_imgs, test_imgs, verbose = 2)

print('Test loss: {}'.format(test_scores[0]))
print('Mean Squared Error: {} %'.format(test_scores[1] * 100))

94/94 - 1s - loss: 0.0259 - mse: 0.0259 - 509ms/epoch - 5ms/step
Test loss: 0.025880122557282448
Mean Squared Error: 2.588012255728245 %

# Visualize Evaluation on Test Data

rand_idx = np.random.randint(1, test_imgs.shape[0])
# rand_idx = 6

test_img = test_imgs[rand_idx]
reconst_img = autoencoder.predict(test_img.reshape(1, 28*28))

plt.figure(figsize = (8, 4))
plt.subplot(1, 2, 1)
plt.imshow(test_img.reshape(28, 28), 'gray')
plt.title('Input Image', fontsize = 12)
plt.xticks([])
plt.yticks([])

plt.subplot(1, 2, 2)
plt.imshow(reconst_img.reshape(28, 28), 'gray')
plt.title('Reconstructed Image', fontsize = 12)
plt.xticks([])
plt.yticks([])

plt.show()

1/1 [==============================] - 0s 106ms/step

Latent Space

test_latent = encoder.predict(test_imgs)

plt.figure(figsize = (6, 6))
plt.scatter(test_latent[test_labels == 1,0], test_latent[test_labels == 1,1], alpha = 0.4, label = '1')
plt.scatter(test_latent[test_labels == 5,0], test_latent[test_labels == 5,1], alpha = 0.4, label = '5')
plt.scatter(test_latent[test_labels == 6,0], test_latent[test_labels == 6,1], alpha = 0.4, label = '6')
plt.title('Latent Space', fontsize = 12)
plt.xlabel('$Z_1$', fontsize = 12)
plt.ylabel('$Z_2$', fontsize = 12)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.show()

94/94 [==============================] - 1s 7ms/step

Data Generation

It generates something that makes sense.
These results are unsatisfying, because the density model used on the latent space ℱ is too simple and inadequate.
Building a “good” model amounts to our original problem of modeling an empirical distribution, although it may now be in a lower dimension space.
This is a motivation to VAE, GAN, or diffusion model.

new_data = np.array([[-2, 4]])

fake_image = decoder.predict(new_data)

plt.figure(figsize = (9, 4))

plt.subplot(1, 2, 1)
plt.scatter(test_latent[test_labels == 1,0], test_latent[test_labels == 1,1], alpha = 0.4, label = '1')
plt.scatter(test_latent[test_labels == 5,0], test_latent[test_labels == 5,1], alpha = 0.4, label = '5')
plt.scatter(test_latent[test_labels == 6,0], test_latent[test_labels == 6,1], alpha = 0.4, label = '6')
plt.scatter(new_data[:,0], new_data[:,1], c = 'k', marker = 'o', s = 200, label = 'new data')
plt.title('Latent Space', fontsize = 10)
plt.xlabel('$Z_1$', fontsize = 10)
plt.ylabel('$Z_2$', fontsize = 10)
plt.legend(loc = 2, fontsize = 10)
plt.axis('equal')

plt.subplot(1, 2, 2)
plt.imshow(fake_image.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 10)
plt.xticks([])
plt.yticks([])
plt.show()

1/1 [==============================] - 0s 175ms/step