KAIST 산학협동 공개강좌

인공지능과 설계: 해석 예측에서 설계 최적화까지


Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

Practice Aims & Objectives

  1. Implement two dimension reduction methods: PCA and autoencoder
  2. Perform data generation with autoencoder

1. Principal Component Analysis (PCA)


Motivation: Can we describe high-dimensional data in a "simpler" way?

$\quad \rightarrow$ Dimension reduction without losing too much information
$\quad \rightarrow$ Find a low-dimensional, yet useful representation of the data

  • Why dimensionality reduction?
    • insights into the low-dimensinal structures in the data (visualization)
    • Fewer dimensions ⇒ Less chances of overfitting ⇒ Better generalization
    • Speeding up learning algorithms
      • Most algorithms scale badly with increasing data dimensionality
    • Less storage requirements (data compression)
    • Note: Dimensionality Reduction is different from Feature Selection
      • .. although the goals are kind of the same
    • Dimensionality reduction is more like “Feature Extraction
      • Constructing a small set of new features from the original features
  • Data $\rightarrow$ projection onto unit vector $\hat{u}_1$
    • PCA is used when we want projections capturing maximum variance directions
    • Principal Components (PC): directions of maximum variability in the data
    • Roughly speaking, PCA does a change of axes that can represent the data in a succinct manner
  • How?
    idea: highly correlated data contains redundant features




  • Each example $x$ has 2 features $\{x_1,x_2\}$

  • Consider ignoring the feature $x_2$ for each example

  • Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{x_1\}$

  • Are we losing much information by throwing away $x_2$ ?



  • Each example $x$ has 2 features $\{x_1,x_2\}$

  • Consider ignoring the feature $x_2$ for each example

  • Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{x_1\}$

  • Are we losing much information by throwing away $x_2$ ?

  • Yes, the data has substantial variance along both features (i.e. both axes)



  • Now consider a change of axes

  • Each example $x$ has 2 features $\{u_1,u_2\}$

  • Consider ignoring the feature $u_2$ for each example

  • Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{u_1\}$

  • Are we losing much information by throwing away $u_2$ ?

  • No. Most of the data spread is along $u_1$ (very little variance along $u_2$)





  • How?
    1. Maximize variance (most separable)
    2. Minimize the sum-of-squares (minimum squared error)

2. Autoencoders


Definition

  • An autoencoder is a neural network that is trained to attempt to copy its input to its output
  • The network consists of two parts: an encoder and a decoder that produce a reconstruction


Encoder and Decoder

  • Encoder function : $z = f(x)$
  • Decoder function : $x = g(z)$
  • We learn to set $g\left(f(x)\right) = x$





3. Dimension Reduction with PCA and Autoencoder

3.1 Data Description: MNIST

What is MNIST

From Wikipedia

  • The MNIST database (Mixed National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, NIST's complete dataset was too hard.
  • MNIST (Mixed National Institute of Standards and Technology database) database

    - Handwritten digit database

    $28 \times 28$ gray scaled image

    Flattened matrix into a vector of $28 \times 28 = 784$



Objective: 784 → 2 Dimensions

3.2 Load MNIST Data

  • Use only (1, 5, 6) digits to visualize in 2-D
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline
In [ ]:
# Load Data

mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()
train_x, test_x = train_x.reshape(-1, 784)/255.0, test_x.reshape(-1, 784)/255.0
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step
In [ ]:
# Use only 1,5,6 digits to visualize latent sapce

train_idx1 = np.array(np.where(train_y == 1))
train_idx5 = np.array(np.where(train_y == 5))
train_idx6 = np.array(np.where(train_y == 6))
train_idx = np.sort(np.concatenate((train_idx1, train_idx5, train_idx6), axis = None))

test_idx1 = np.array(np.where(test_y == 1))
test_idx5 = np.array(np.where(test_y == 5))
test_idx6 = np.array(np.where(test_y == 6))
test_idx = np.sort(np.concatenate((test_idx1, test_idx5, test_idx6), axis = None))

train_imgs = train_x[train_idx]
train_labels = train_y[train_idx]
test_imgs = test_x[test_idx]
test_labels = test_y[test_idx]

n_train = train_imgs.shape[0]
n_test = test_imgs.shape[0]

print ("The number of training images : {}, shape : {}".format(n_train, train_imgs.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_imgs.shape))
The number of training images : 18081, shape : (18081, 784)
The number of testing images : 2985, shape : (2985, 784)

3.3 PCA

In [ ]:
from sklearn.decomposition import PCA

pca = PCA(n_components = 2)

test_pca = pca.fit_transform(test_imgs)
In [ ]:
test_pca.shape
Out[ ]:
(2985, 2)
In [ ]:
plt.figure(figsize = (6, 6))

plt.scatter(test_pca[test_labels == 1,0], test_pca[test_labels == 1,1], alpha = 0.4, label = '1')
plt.scatter(test_pca[test_labels == 5,0], test_pca[test_labels == 5,1], alpha = 0.4, label = '5')
plt.scatter(test_pca[test_labels == 6,0], test_pca[test_labels == 6,1], alpha = 0.4, label = '6')
plt.title('Latent Space', fontsize = 12)
plt.xlabel('$Z_1$', fontsize = 12)
plt.ylabel('$Z_2$', fontsize = 12)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.show()

3.4 Autoencoder: Dimension Reduction and Data Generation

Define a Structure of an Autoencoder

  • Input shape and latent variable shape
  • Encoder shape
  • Decoder shape


In [ ]:
# Encoder Structure
encoder = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 500, activation = 'relu', input_dim = 784),
    tf.keras.layers.Dense(units = 300, activation = 'relu'),
    tf.keras.layers.Dense(units = 2, activation = None)
    ])

# Decoder Structure
decoder = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 300, activation = 'relu', input_shape = (2,)),
    tf.keras.layers.Dense(units = 500, activation = 'relu'),
    tf.keras.layers.Dense(units = 28*28, activation = None)
    ])

# Autoencoder = Encoder + Decoder
autoencoder = tf.keras.models.Sequential([encoder, decoder])

Define Loss and Optimizer

  • Loss: Squared loss $$ \frac{1}{m}\sum_{i=1}^{m} (t_{i} - y_{i})^2 $$
  • Optimizer: AdamOptimizer
In [ ]:
autoencoder.compile(optimizer = tf.keras.optimizers.Adam(0.001),
                    loss = 'mean_squared_error',
                    metrics = ['mse'])
In [ ]:
# Train Model & Evaluate Test Data

training = autoencoder.fit(train_imgs, train_imgs, batch_size = 50, epochs = 10)
Epoch 1/10
362/362 [==============================] - 8s 19ms/step - loss: 0.0371 - mse: 0.0371
Epoch 2/10
362/362 [==============================] - 8s 22ms/step - loss: 0.0303 - mse: 0.0303
Epoch 3/10
362/362 [==============================] - 10s 27ms/step - loss: 0.0290 - mse: 0.0290
Epoch 4/10
362/362 [==============================] - 7s 21ms/step - loss: 0.0280 - mse: 0.0280
Epoch 5/10
362/362 [==============================] - 14s 37ms/step - loss: 0.0274 - mse: 0.0274
Epoch 6/10
362/362 [==============================] - 13s 35ms/step - loss: 0.0270 - mse: 0.0270
Epoch 7/10
362/362 [==============================] - 12s 33ms/step - loss: 0.0266 - mse: 0.0266
Epoch 8/10
362/362 [==============================] - 12s 33ms/step - loss: 0.0263 - mse: 0.0263
Epoch 9/10
362/362 [==============================] - 11s 31ms/step - loss: 0.0261 - mse: 0.0261
Epoch 10/10
362/362 [==============================] - 12s 32ms/step - loss: 0.0259 - mse: 0.0259

Test and Evaluate

  • Test reconstruction performance of the autoencoder
In [ ]:
test_scores = autoencoder.evaluate(test_imgs, test_imgs, verbose = 2)

print('Test loss: {}'.format(test_scores[0]))
print('Mean Squared Error: {} %'.format(test_scores[1] * 100))
94/94 - 1s - loss: 0.0259 - mse: 0.0259 - 509ms/epoch - 5ms/step
Test loss: 0.025880122557282448
Mean Squared Error: 2.588012255728245 %
In [ ]:
# Visualize Evaluation on Test Data

rand_idx = np.random.randint(1, test_imgs.shape[0])
# rand_idx = 6

test_img = test_imgs[rand_idx]
reconst_img = autoencoder.predict(test_img.reshape(1, 28*28))

plt.figure(figsize = (8, 4))
plt.subplot(1, 2, 1)
plt.imshow(test_img.reshape(28, 28), 'gray')
plt.title('Input Image', fontsize = 12)
plt.xticks([])
plt.yticks([])

plt.subplot(1, 2, 2)
plt.imshow(reconst_img.reshape(28, 28), 'gray')
plt.title('Reconstructed Image', fontsize = 12)
plt.xticks([])
plt.yticks([])

plt.show()
1/1 [==============================] - 0s 106ms/step

Latent Space

In [ ]:
test_latent = encoder.predict(test_imgs)

plt.figure(figsize = (6, 6))
plt.scatter(test_latent[test_labels == 1,0], test_latent[test_labels == 1,1], alpha = 0.4, label = '1')
plt.scatter(test_latent[test_labels == 5,0], test_latent[test_labels == 5,1], alpha = 0.4, label = '5')
plt.scatter(test_latent[test_labels == 6,0], test_latent[test_labels == 6,1], alpha = 0.4, label = '6')
plt.title('Latent Space', fontsize = 12)
plt.xlabel('$Z_1$', fontsize = 12)
plt.ylabel('$Z_2$', fontsize = 12)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.show()
94/94 [==============================] - 1s 7ms/step

Data Generation

  • It generates something that makes sense.

  • These results are unsatisfying, because the density model used on the latent space ℱ is too simple and inadequate.

  • Building a “good” model amounts to our original problem of modeling an empirical distribution, although it may now be in a lower dimension space.

  • This is a motivation to VAE, GAN, or diffusion model.

In [ ]:
new_data = np.array([[-2, 4]])

fake_image = decoder.predict(new_data)

plt.figure(figsize = (9, 4))

plt.subplot(1, 2, 1)
plt.scatter(test_latent[test_labels == 1,0], test_latent[test_labels == 1,1], alpha = 0.4, label = '1')
plt.scatter(test_latent[test_labels == 5,0], test_latent[test_labels == 5,1], alpha = 0.4, label = '5')
plt.scatter(test_latent[test_labels == 6,0], test_latent[test_labels == 6,1], alpha = 0.4, label = '6')
plt.scatter(new_data[:,0], new_data[:,1], c = 'k', marker = 'o', s = 200, label = 'new data')
plt.title('Latent Space', fontsize = 10)
plt.xlabel('$Z_1$', fontsize = 10)
plt.ylabel('$Z_2$', fontsize = 10)
plt.legend(loc = 2, fontsize = 10)
plt.axis('equal')

plt.subplot(1, 2, 2)
plt.imshow(fake_image.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 10)
plt.xticks([])
plt.yticks([])
plt.show()
1/1 [==============================] - 0s 175ms/step