Generative Adversarial Networks (GAN)


By Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

Table of Contents

Source

  • CS231n: CNN for Visual Recognition

1. Discriminative Model v.s. Generative Model

  • Discriminative model




  • Cenerative model



2. Density Function Estimation

  • Probability
  • What if $x$ is actual images in the training data? At this point, $x$ can be represented as a (for example) $64\times 64 \times 3$ dimensional vector.
    • the following images are some realizations (samples) of $64\times 64 \times 3$ dimensional space
  • Probability density function estimation problem
  • If $P_{\text{model}}(x)$ can be estimated as close to $P_{\text{data}}(x)$, then data can be generated by sampling from $P_{\text{model}}(x)$.

    • Note: Kullback–Leibler Divergence is a kind of distance measure between two distributions
  • Learn determinstic transformation via a neural network
    • Start by sampling the code vector $z$ from a simple, fixed distribution such as a uniform distribution or a standard Gaussian $\mathcal{N}(0,I)$
    • Then this code vector is passed as input to a deterministic generator network $G$, which produces an output sample $x=G(z)$
    • This is how a neural network plays in a generative model (as a nonlinear mapping to a target probability density function)



  • An example of a generator network which encodes a univariate distribution with two different modes



  • Generative model of high dimensional space
  • Generative model of images
    • learn a function which maps independent, normally-distributed $z$ values to whatever latent variables might be needed to the model, and then map those latent variables to $x$ (as images)
    • first few layers to map the normally distributed $z$ to the latent values
    • then, use later layers to map those latent values to an image



3. Generative Adversarial Networks (GAN)

  • In generative modeling, we'd like to train a network that models a distribution, such as a distribution over images.

  • GANs do not work with any explicit density function !

  • Instead, take game-theoretic approach

3.1. Adversarial Nets Framework

  • One way to judge the quality of the model is to sample from it.

  • Model to produce samples which are indistinguishable from the real data, as judged by a discriminator network whose job is to tell real from fake





  • The idea behind Generative Adversarial Networks (GANs): train two different networks


  • Discriminator network: try to distinguish between real and fake data


  • Generator network: try to produce realistic-looking samples to fool the discriminator network


3.2. Objective Function of GAN

  • Think about a logistic regression classifier (or cross entropy loss $(h(x),y)$)


$$\text{loss} = -y \log h(x) - (1-y) \log (1-h(x))$$

  • To train the discriminator


  • To train the generator


  • Non-Saturating Game when the generator is trained

  • Early in learning, when $G$ is poor, $D$ can reject samples with high confidence because they are clearly different from the training data. In this case, $\log(1-D(G(z)))$ saturates.



  • Rather than training $G$ to minimize $\log(1-D(G(z)))$ we can train $G$ to maximize $\log D(G(z))$. This objective function provides much stronger gradients early in learning.

3.3. Soving a MinMax Problem


Step 1: Fix $G$ and perform a gradient step to


$$\max_{D} E_{x \sim p_{\text{data}}(x)}\left[\log D(x)\right] + E_{z \sim p_{z}(z)}\left[\log (1-D(G(z)))\right]$$


Step 2: Fix $D$ and perform a gradient step to


$$\max_{G} E_{z \sim p_{z}(z)}\left[\log D(G(z))\right]$$


OR


Step 1: Fix $G$ and perform a gradient step to


$$\min_{D} E_{x \sim p_{\text{data}}(x)}\left[-\log D(x)\right] + E_{z \sim p_{z}(z)}\left[-\log (1-D(G(z)))\right]$$


Step 2: Fix $D$ and perform a gradient step to


$$\min_{G} E_{z \sim p_{z}(z)}\left[-\log D(G(z))\right]$$

4. GAN with MNIST

4.1. GAN Implementation

In [ ]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [ ]:
(train_x, train_y), _ = tf.keras.datasets.mnist.load_data()

train_x = train_x[np.where(train_y == 2)]
train_x = train_x/255.0
train_x = train_x.reshape(-1, 784)

print('train_iamges :', train_x.shape)
train_iamges : (5958, 784)
In [ ]:
generator = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 256, activation = 'relu', input_dim = 100),
    tf.keras.layers.Dense(units = 784, activation = 'sigmoid')
])
In [ ]:
discriminator = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 256, activation = 'relu', input_dim = 784),
    tf.keras.layers.Dense(units = 1, activation = 'sigmoid'),
])
In [ ]:
discriminator.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0001),
                      loss = 'binary_crossentropy')
In [ ]:
discriminator.trainable = False

combined_input = tf.keras.layers.Input(shape = (100,))
generated = generator(combined_input)
combined_output = discriminator(generated)

combined = tf.keras.models.Model(inputs = combined_input, outputs = combined_output)
In [ ]:
combined.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0002),
                 loss = 'binary_crossentropy')
In [ ]:
def make_noise(samples):
    return np.random.normal(0, 1, [samples, 100])
In [ ]:
def plot_generated_images(generator, samples = 3):

    noise = make_noise(samples)

    generated_images = generator.predict(noise)
    generated_images = generated_images.reshape(samples, 28, 28)

    for i in range(samples):
        plt.subplot(1, samples, i+1)
        plt.imshow(generated_images[i], 'gray', interpolation = 'nearest')
        plt.axis('off')
        plt.tight_layout()

    plt.show()

Step 1: Fix $G$ and perform a gradient step to


$$\min_{D} E_{x \sim p_{\text{data}}(x)}\left[-\log D(x)\right] + E_{x \sim p_{z}(z)}\left[-\log (1-D(G(z)))\right]$$


Step 2: Fix $D$ and perform a gradient step to


$$\min_{G} E_{x \sim p_{z}(z)}\left[-\log D(G(z))\right]$$
In [ ]:
n_iter = 20000
batch_size = 100

fake = np.zeros(batch_size)
real = np.ones(batch_size)

for i in range(n_iter):

    # Train Discriminator
    noise = make_noise(batch_size)
    generated_images = generator.predict(noise, verbose = 0)

    idx = np.random.randint(0, train_x.shape[0], batch_size)
    real_images = train_x[idx]

    D_loss_real = discriminator.train_on_batch(real_images, real)
    D_loss_fake = discriminator.train_on_batch(generated_images, fake)
    D_loss = D_loss_real + D_loss_fake

    # Train Generator
    noise = make_noise(batch_size)
    G_loss = combined.train_on_batch(noise, real)

    if i % 5000 == 0:

        print('Discriminator Loss: ', D_loss)
        print('Generator Loss: ', G_loss)

        plot_generated_images(generator)
Discriminator Loss:  1.873877376317978
Generator Loss:  0.338532954454422
1/1 [==============================] - 0s 16ms/step
Discriminator Loss:  0.18016396462917328
Generator Loss:  2.5140249729156494
1/1 [==============================] - 0s 15ms/step
Discriminator Loss:  0.44247880578041077
Generator Loss:  2.2012200355529785
1/1 [==============================] - 0s 15ms/step
Discriminator Loss:  0.5620711445808411
Generator Loss:  2.061182975769043
1/1 [==============================] - 0s 16ms/step

4.2. After Training

  • After training, use the generator network to generate new data


In [ ]:
plot_generated_images(generator)
1/1 [==============================] - 0s 27ms/step

5. Conditional GAN

  • In an unconditioned generative model, there is no control on modes of the data being generated.
  • In the Conditional GAN (CGAN), the generator learns to generate a fake sample with a specific condition or characteristics (such as a label associated with an image or more detailed tag) rather than a generic sample from unknown noise distribution.




  • Simple modification to the original GAN framework that conditions the model on additional information for better multi-modal learning
  • Many practical applications of GANs when we have explicit supervision available
In [ ]:
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
In [ ]:
(train_x, train_y), (test_x, test_y) = tf.keras.datasets.mnist.load_data()

train_x, test_x = train_x/255.0 , test_x/255.0
train_x, test_x = train_x.reshape(-1,784), test_x.reshape(-1,784)

train_y = tf.keras.utils.to_categorical(train_y, num_classes = 10)
test_y = tf.keras.utils.to_categorical(test_y, num_classes = 10)

print('train_x: ', train_x.shape)
print('test_x: ', test_x.shape)
print('train_y: ', train_y.shape)
print('test_y: ', test_y.shape)
train_x:  (60000, 784)
test_x:  (10000, 784)
train_y:  (60000, 10)
test_y:  (10000, 10)
In [ ]:
generator_model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 256, activation = 'relu', input_dim = 138),
    tf.keras.layers.Dense(units = 784, activation = 'sigmoid')
])

noise = tf.keras.layers.Input(shape = (128,))
label = tf.keras.layers.Input(shape = (10,))

model_input = tf.keras.layers.concatenate([noise, label], axis = 1)
generated_image = generator_model(model_input)

generator = tf.keras.models.Model(inputs = [noise, label], outputs = generated_image)
In [ ]:
generator.summary()
Model: "model_3"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_7 (InputLayer)           [(None, 128)]        0           []                               
                                                                                                  
 input_8 (InputLayer)           [(None, 10)]         0           []                               
                                                                                                  
 concatenate_2 (Concatenate)    (None, 138)          0           ['input_7[0][0]',                
                                                                  'input_8[0][0]']                
                                                                                                  
 sequential_2 (Sequential)      (None, 784)          237072      ['concatenate_2[0][0]']          
                                                                                                  
==================================================================================================
Total params: 237,072
Trainable params: 237,072
Non-trainable params: 0
__________________________________________________________________________________________________
In [ ]:
discriminator_model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 256, activation = 'relu', input_dim = 794),
    tf.keras.layers.Dense(units = 1, activation = 'sigmoid')
])

input_image = tf.keras.layers.Input(shape = (784,))
label = tf.keras.layers.Input(shape = (10,))

model_input = tf.keras.layers.concatenate([input_image, label], axis = 1)
validity = discriminator_model(model_input)

discriminator = tf.keras.models.Model(inputs = [input_image, label], outputs = validity)
In [ ]:
discriminator.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0002),
                      loss = ['binary_crossentropy'])
In [ ]:
discriminator.summary()
Model: "model_4"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_9 (InputLayer)           [(None, 784)]        0           []                               
                                                                                                  
 input_10 (InputLayer)          [(None, 10)]         0           []                               
                                                                                                  
 concatenate_3 (Concatenate)    (None, 794)          0           ['input_9[0][0]',                
                                                                  'input_10[0][0]']               
                                                                                                  
 sequential_3 (Sequential)      (None, 1)            203777      ['concatenate_3[0][0]']          
                                                                                                  
==================================================================================================
Total params: 203,777
Trainable params: 203,777
Non-trainable params: 0
__________________________________________________________________________________________________
In [ ]:
discriminator.trainable = False

noise = tf.keras.layers.Input(shape = (128,))
label = tf.keras.layers.Input(shape = (10,))

generated_image = generator([noise, label])
validity = discriminator([generated_image, label])

combined = tf.keras.models.Model(inputs = [noise, label], outputs = validity)
In [ ]:
combined.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0002),
                      loss = ['binary_crossentropy'])
In [ ]:
combined.summary()
Model: "model_5"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_11 (InputLayer)          [(None, 128)]        0           []                               
                                                                                                  
 input_12 (InputLayer)          [(None, 10)]         0           []                               
                                                                                                  
 model_3 (Functional)           (None, 784)          237072      ['input_11[0][0]',               
                                                                  'input_12[0][0]']               
                                                                                                  
 model_4 (Functional)           (None, 1)            203777      ['model_3[0][0]',                
                                                                  'input_12[0][0]']               
                                                                                                  
==================================================================================================
Total params: 440,849
Trainable params: 237,072
Non-trainable params: 203,777
__________________________________________________________________________________________________
In [ ]:
def create_noise(samples):
    return np.random.normal(0, 1, [samples, 128])
In [ ]:
def plot_generated_images(generator):

    noise = create_noise(10)
    label = np.arange(0, 10).reshape(-1, 1)
    label_onehot = np.eye(10)[label.reshape(-1)]

    generated_images = generator.predict([noise, label_onehot])

    plt.figure(figsize = (12, 3))
    for i in range(generated_images.shape[0]):
        plt.subplot(1, 10, i + 1)
        plt.imshow(generated_images[i].reshape((28, 28)), 'gray', interpolation = 'nearest')
        plt.title('Digit: {}'.format(i))
        plt.axis('off')

    plt.show()
In [ ]:
n_iter = 30000
batch_size = 50

valid = np.ones(batch_size)
fake = np.zeros(batch_size)

for i in range(n_iter):

    # Train Discriminator
    idx = np.random.randint(0, train_x.shape[0], batch_size)
    real_images, labels = train_x[idx], train_y[idx]

    noise = create_noise(batch_size)
    generated_images = generator.predict([noise,labels], verbose = 0)

    d_loss_real = discriminator.train_on_batch([real_images, labels], valid)
    d_loss_fake = discriminator.train_on_batch([generated_images, labels], fake)
    d_loss = d_loss_real + d_loss_fake

    # Train Generator
    noise = create_noise(batch_size)
    labels = np.random.randint(0, 10, batch_size)
    labels_onehot = np.eye(10)[labels]

    g_loss = combined.train_on_batch([noise, labels_onehot], valid)

    if i % 5000 == 0:

        print('Discriminator Loss: ', d_loss)
        print('Generator Loss: ', g_loss)

        plot_generated_images(generator)
Discriminator Loss:  1.5396462678909302
Generator Loss:  0.8892247676849365
1/1 [==============================] - 0s 26ms/step
Discriminator Loss:  0.09676432050764561
Generator Loss:  4.2714643478393555
1/1 [==============================] - 0s 25ms/step
Discriminator Loss:  0.08658699691295624
Generator Loss:  5.2906494140625
1/1 [==============================] - 0s 25ms/step
Discriminator Loss:  0.21492188423871994
Generator Loss:  4.767454147338867
1/1 [==============================] - 0s 40ms/step
Discriminator Loss:  1.4001021981239319
Generator Loss:  1.8565527200698853
1/1 [==============================] - 0s 41ms/step
Discriminator Loss:  0.19797339290380478
Generator Loss:  4.865542888641357
1/1 [==============================] - 0s 28ms/step

6. InfoGAN (Information Maximizing GAN)

  • In a standard generative model, there is no control on the features of the data being generated.

  • In the Information Maximizing GAN (InfoGAN), the generator learns to generate a fake sample with latent codes (such as values in the range of -1 to 1) that has interpretable information of the data rather than a generic sample from unknown noise distribution.

  • The latent code in InfoGAN learns interpretable information from the data using unsupervised learning.

  • For instance, MNIST digits generated by latent code variation

  • Simple modification to the original GAN framework, the latent code c is input to the generator and the added Q Net predicts the latent code c of a fake sample x_fake.

  • The generative model learns interpretable information from the data by itself.

Generator at Conditional GAN

  • Feed a random point in latent space and desired number.
  • Even if the same latent point is used for two different numbers, the process will work correctly since the latent space only encodes features such as stroke width or angle

Generator at InfoGAN

  • Feed a random point in latent space and latent code.
  • Even if the same latent point is used, the process will work correctly by controlling the interpretable information in the data through latent code.

6.1. DCGAN (Deep Convolutional GAN)

We employed the fully connected neural networks for all the previous GAN examples. DCGAN is a direct extension of GAN, differing primarily in its utilization of convolutional and convolutional-transpose layers in the discriminator and generator.

6.2. InfoGAN Implementation

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
In [ ]:
(train_x, train_y), _ = tf.keras.datasets.mnist.load_data()

train_x = train_x[np.where(train_y == 2)]
train_x = train_x/255.0
train_x = train_x.reshape(-1, 28, 28, 1)

print('train_iamges :', train_x.shape)
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step
train_iamges : (5958, 28, 28, 1)
In [ ]:
generator = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 1024,
                          use_bias = False,
                          input_shape = (62 + 2,)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.ReLU(),
    tf.keras.layers.Dense(units = 7*7*128,
                          use_bias = False),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.ReLU(),
    tf.keras.layers.Reshape((7, 7, 128)),
    tf.keras.layers.Conv2DTranspose(64,
                                    (4, 4),
                                    strides = (2, 2),
                                    padding = 'same',
                                    use_bias = False),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.ReLU(),
    tf.keras.layers.Conv2DTranspose(1,
                                    (4, 4),
                                    strides = (2, 2),
                                    padding = 'same',
                                    use_bias = False,
                                    activation = 'sigmoid')
])
In [ ]:
extractor = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(64,
                           (4, 4),
                           strides = (2, 2),
                           padding = 'same',
                           use_bias = False,
                           input_shape = [28, 28, 1]),
    tf.keras.layers.LeakyReLU(),
    tf.keras.layers.Conv2D(128,
                           (4, 4),
                           strides = (2, 2),
                           padding = 'same',
                           use_bias = False),
    tf.keras.layers.LayerNormalization(),
    tf.keras.layers.LeakyReLU(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units = 1024,
                          use_bias = False),
    tf.keras.layers.LayerNormalization(),
    tf.keras.layers.LeakyReLU()
])
In [ ]:
d_network = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 1,
                          input_shape = (1024,),
                          use_bias = False,
                          activation = 'sigmoid')
])
In [ ]:
q_network = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 128,
                          use_bias = False,
                          input_shape = (1024,)),
    tf.keras.layers.LayerNormalization(),
    tf.keras.layers.LeakyReLU(),
    tf.keras.layers.Dense(units = 2,
                          use_bias = False)
])
In [ ]:
combined_input = tf.keras.layers.Input(shape = (28, 28, 1))
combined_feature = extractor(combined_input)
combined_output = d_network(combined_feature)

discriminator = tf.keras.models.Model(inputs = combined_input,
                                      outputs = combined_output)
In [ ]:
discriminator.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 2e-4),
                      loss = 'binary_crossentropy')
In [ ]:
extractor.trainable = False
d_network.trainable = False

combined_input = tf.keras.layers.Input(shape = (62 + 2,))
generated = generator(combined_input)
combined_feature = extractor(generated)
combined_output = d_network(combined_feature)

combined_d = tf.keras.models.Model(inputs = combined_input,
                                   outputs = combined_output)
In [ ]:
combined_d.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 1e-3),
                   loss = 'binary_crossentropy')
In [ ]:
extractor.trainable = False
d_network.trainable = False

combined_input = tf.keras.layers.Input(shape = (62 + 2,))
generated = generator(combined_input)
combined_feature = extractor(generated)
combined_latent = q_network(combined_feature)

combined_q = tf.keras.models.Model(inputs = combined_input,
                                   outputs = combined_latent)
In [ ]:
combined_q.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 1e-3),
                   loss = 'mean_squared_error')
In [ ]:
def make_noise(samples):
    return np.random.uniform(-1, 1, size = [samples, 62])

def make_code(samples):
    return 2*np.random.rand(samples, 2) - 1
In [ ]:
def plot_generated_images(generator):
    z = np.random.randn(1, 62).repeat(5, axis = 0)
    c = np.stack([np.linspace(-1, 1, 5), np.zeros(5)]).T
    noise = np.concatenate([z, c], -1)

    generated_images = generator.predict(noise, verbose = 0)
    generated_images = generated_images.reshape(5, 28, 28)

    print('')
    print('Continuous Latent Code 1')
    for i in range(5):
        plt.subplot(1, 5, i+1)
        plt.imshow(generated_images[i], 'gray', interpolation = 'nearest')
        plt.axis('off')
        plt.tight_layout()
    plt.show()

    z = np.random.randn(1, 62).repeat(5, axis = 0)
    c = np.stack([np.zeros(5), np.linspace(-1, 1, 5)]).T
    noise = np.concatenate([z, c], -1)

    generated_images = generator.predict(noise, verbose = 0)
    generated_images = generated_images.reshape(5, 28, 28)

    print('Continuous Latent Code 2')
    for i in range(5):
        plt.subplot(1, 5, i+1)
        plt.imshow(generated_images[i], 'gray', interpolation = 'nearest')
        plt.axis('off')
        plt.tight_layout()
    plt.show()
    print('')
In [ ]:
n_iter = 5000
batch_size = 256

real = np.ones((batch_size, 1))
fake = np.zeros((batch_size, 1))

for i in range(n_iter):

    # Train Discriminator
    for _ in range(2):
        z = make_noise(batch_size)
        c = make_code(batch_size)
        noise = np.concatenate([z, c], -1)

        generated_images = generator.predict(noise, verbose = 0)

        idx = np.random.choice(len(train_x), batch_size, replace = False)
        real_images = train_x[idx]

        D_loss_real = discriminator.train_on_batch(real_images, real)
        D_loss_fake = discriminator.train_on_batch(generated_images, fake)
        D_loss = D_loss_real + D_loss_fake

    # Train Generator & Q Net
    for _ in range(1):
        z = make_noise(batch_size)
        c = make_code(batch_size)
        noise = np.concatenate([z, c], -1)

        G_loss = combined_d.train_on_batch(noise, real)
        Q_loss = combined_q.train_on_batch(noise, c)

    # Print Loss
    if (i + 1) % 500 == 0:
        print('Epoch: {:5d} | Discriminator Loss: {:.3f} | Generator Loss: {:.3f} | Q Net Loss: {:.3f}'.format(i + 1, D_loss, G_loss, Q_loss))
        plot_generated_images(generator)
Epoch:   500 | Discriminator Loss: 0.001 | Generator Loss: 7.627 | Q Net Loss: 0.055

Continuous Latent Code 1
Continuous Latent Code 2
Epoch:  1000 | Discriminator Loss: 0.020 | Generator Loss: 4.031 | Q Net Loss: 0.020

Continuous Latent Code 1
Continuous Latent Code 2
Epoch:  1500 | Discriminator Loss: 0.127 | Generator Loss: 2.854 | Q Net Loss: 0.033

Continuous Latent Code 1
Continuous Latent Code 2
Epoch:  2000 | Discriminator Loss: 0.186 | Generator Loss: 7.292 | Q Net Loss: 0.020

Continuous Latent Code 1
Continuous Latent Code 2
Epoch:  2500 | Discriminator Loss: 0.283 | Generator Loss: 5.061 | Q Net Loss: 0.021

Continuous Latent Code 1
Continuous Latent Code 2
Epoch:  3000 | Discriminator Loss: 0.288 | Generator Loss: 5.738 | Q Net Loss: 0.012

Continuous Latent Code 1
Continuous Latent Code 2
Epoch:  3500 | Discriminator Loss: 0.172 | Generator Loss: 5.480 | Q Net Loss: 0.010

Continuous Latent Code 1
Continuous Latent Code 2
Epoch:  4000 | Discriminator Loss: 0.166 | Generator Loss: 5.732 | Q Net Loss: 0.009

Continuous Latent Code 1
Continuous Latent Code 2
Epoch:  4500 | Discriminator Loss: 0.281 | Generator Loss: 4.834 | Q Net Loss: 0.007

Continuous Latent Code 1
Continuous Latent Code 2
Epoch:  5000 | Discriminator Loss: 0.254 | Generator Loss: 5.724 | Q Net Loss: 0.007

Continuous Latent Code 1
Continuous Latent Code 2

In [ ]:
images_save_1 = []
images_save_2 = []

for i in range(8):
    z = np.random.randn(1, 62).repeat(8, axis = 0)

    # Continuous Latent Code 1
    c = np.stack([np.linspace(-1, 1, 8), np.zeros(8)]).T
    noise = np.concatenate([z, c], -1)

    generated_images = generator.predict(noise, verbose=0)
    generated_images = generated_images.reshape(8, 28, 28)

    images_save_1.append(generated_images)

    # Continuous Latent Code 2
    c = np.stack([np.zeros(8), np.linspace(-1, 1, 8)]).T
    noise = np.concatenate([z, c], -1)

    generated_images = generator.predict(noise, verbose=0)
    generated_images = generated_images.reshape(8, 28, 28)

    images_save_2.append(generated_images)
In [ ]:
print('Continuous Latent Code 1')
fig, ax = plt.subplots(8, 8, figsize = (10, 10))

for i in range(8):
    for j in range(8):
        ax[i][j].imshow(images_save_1[i][j], 'gray')
        ax[i][j].set_xticks([])
        ax[i][j].set_yticks([])

plt.show()
Continuous Latent Code 1
In [ ]:
print('Continuous Latent Code 2')
fig, ax = plt.subplots(8, 8, figsize = (10, 10))

for i in range(8):
    for j in range(8):
        ax[i][j].imshow(images_save_2[i][j], 'gray')
        ax[i][j].set_xticks([])
        ax[i][j].set_yticks([])

plt.show()
Continuous Latent Code 2

7. CycleGAN

Change the style of image to another style

  • Monet to photos
  • Zebras to horses
  • Summer to winter

Limitation of paired datasets

  • Impossible to collect paired datasets in most of the cases.

Start from naive GAN

  • Given an image X (Horse), transform it into the target image Y (Zebra)

Utilize two generators and cycle-consistency loss to preserve the contents of input images.

  • $G_{XY}(X \rightarrow Y)$ and $G_{YX}(Y \rightarrow X)$

  • Cycle-consistency loss: $G_{YX}(G_{XY}(X))=X$

8. Adversarial Autoencoder (AAE)

8.1. Limitation of Autoencoder

Autoencoder: Manifold learning model (≠ Generative model)

  • Manifold is randomly generated for each training
  • Hard to generate new data from the manifold

Generate Data from Controlled Latent Space

  • Encode data into a controllable latent space
  • Generate new data from the controlled space

8.2. Adversarial Autoencoder

8.3. Incorporating Label Information

Disentangled Latent Representation

8.4. Implementation of AAE

In [ ]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import random
In [ ]:
mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()

(train_x, train_y), (test_x, test_y) = mnist.load_data()
train_x, test_x = train_x.reshape(-1, 784)/255.0, test_x.reshape(-1, 784)/255.0
In [ ]:
# Use only 0,1,2,3,4,5 digits to visualize latent sapce

train_idx0 = np.array(np.where(train_y == 0))
train_idx1 = np.array(np.where(train_y == 1))
train_idx2 = np.array(np.where(train_y == 2))
train_idx3 = np.array(np.where(train_y == 3))
train_idx4 = np.array(np.where(train_y == 4))
train_idx5 = np.array(np.where(train_y == 5))
train_idx = np.sort(np.concatenate((train_idx0, train_idx1, train_idx2, train_idx3, train_idx4, train_idx5), axis = None))

test_idx0 = np.array(np.where(test_y == 0))
test_idx1 = np.array(np.where(test_y == 1))
test_idx2 = np.array(np.where(test_y == 2))
test_idx3 = np.array(np.where(test_y == 3))
test_idx4 = np.array(np.where(test_y == 4))
test_idx5 = np.array(np.where(test_y == 5))
test_idx = np.sort(np.concatenate((test_idx0, test_idx1, test_idx2, test_idx3, test_idx4, test_idx5), axis = None))

train_imgs = train_x[train_idx]
train_labels = train_y[train_idx]
test_imgs = test_x[test_idx]
test_labels = test_y[test_idx]

n_train = train_imgs.shape[0]
n_test = test_imgs.shape[0]

print ("The number of training images : {}, shape : {}".format(n_train, train_imgs.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_imgs.shape))
The number of training images : 36017, shape : (36017, 784)
The number of testing images : 6031, shape : (6031, 784)
In [ ]:
# Define Structure

# Encoder
encoder = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 500, activation = 'relu', input_shape = (784,)),
    tf.keras.layers.Dense(units = 300, activation = 'relu'),
    tf.keras.layers.Dense(units = 2, activation = None)
    ])

# Decoder
decoder = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 300, activation = 'relu', input_shape = (2,)),
    tf.keras.layers.Dense(units = 500, activation = 'relu'),
    tf.keras.layers.Dense(units = 784, activation = None)
    ])

# Discriminator
discriminator = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 100, activation = 'relu', input_shape = (2,)),
    tf.keras.layers.Dense(units = 100, activation = 'relu'),
    tf.keras.layers.Dense(units = 1, activation =  'sigmoid')
    ])
In [ ]:
autoencoder = tf.keras.models.Sequential([encoder, decoder])
autoencoder.compile(optimizer = tf.keras.optimizers.Adam(0.0005),
                    loss = 'mean_squared_error')

encoder.trainable = False
discriminator.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0005),
                      loss = 'binary_crossentropy')

discriminator.trainable = False
encoder.trainable = True

combined_input = tf.keras.layers.Input(shape = (28*28,))
latent_variable = encoder(combined_input)
combined_output = discriminator(latent_variable )
combined = tf.keras.models.Model(inputs = combined_input, outputs = combined_output)

combined.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0005),
                 loss = 'binary_crossentropy')
In [ ]:
def gaussian_mixture_sampler(batchsize, ndim, num_labels):
    if ndim % 2 != 0:
        raise Exception("ndim must be a multiple of 2.")

    def sample(x, y, label, num_labels):
        shift = 1.4
        r = 2 * np.pi / float(num_labels) * float(label)
        new_x = x * np.cos(r) - y * np.sin(r)
        new_y = x * np.sin(r) + y * np.cos(r)
        new_x += shift * np.cos(r)
        new_y += shift * np.sin(r)
        return np.array([new_x, new_y]).reshape((2,)), tf.one_hot(label, num_labels + 1)

    x_var = 0.3
    y_var = 0.05
    x = np.random.normal(0, x_var, (batchsize, ndim // 2))
    y = np.random.normal(0, y_var, (batchsize, ndim // 2))
    z = np.empty((batchsize, ndim), dtype = np.float32)
    z_label = np.empty((batchsize, num_labels + 1), dtype = np.float32)

    for batch in range(batchsize):
        for zi in range(ndim // 2):
            z[batch, zi*2:zi*2+2], z_label[batch] = sample(x[batch, zi], y[batch, zi], random.randint(0, num_labels - 1), num_labels)
    return z, z_label
In [ ]:
prior, one_hot_label = gaussian_mixture_sampler(1000, 2, 6)

c = np.array(['r', 'g', 'b', 'c', 'y', 'k'])
plt.scatter(prior[:, 0], prior[:, 1], s = 1, c = c[list(np.argmax(one_hot_label, 1))])
plt.show()
In [ ]:
def plot_latent_space(encoder, samples = 1000):
    idx = np.random.randint(0, train_imgs.shape[0], samples)
    latent_fake = encoder.predict(train_imgs[idx], verbose = 0)

    c = np.array(['r', 'g', 'b', 'c', 'y', 'k'])

    for i, el in enumerate([0, 1, 2, 3, 4, 5]):
        label_idx = np.where(train_labels[idx] == el)[0]
        plt.scatter(latent_fake[label_idx, 0], latent_fake[label_idx, 1], s = 1, label = el, c = c[i])

    plt.legend()
    plt.xticks([])
    plt.yticks([])
    plt.show()
In [ ]:
n_iter = 10000
batch_size = 100

fake = np.zeros(batch_size)
real = np.ones(batch_size)

for i in range(n_iter):
    idx = np.random.randint(0, train_imgs.shape[0], batch_size)

    # Train Autoencoder
    AE_loss = autoencoder.train_on_batch(train_imgs[idx], train_imgs[idx])

    # Train Discriminator
    latent_true, _ = gaussian_mixture_sampler(batch_size, 2, 6)
    latent_fake = encoder.predict(train_imgs[idx], verbose = 0)

    D_loss_real = discriminator.train_on_batch(latent_true, real)
    D_loss_fake = discriminator.train_on_batch(latent_fake, fake)
    D_loss = D_loss_real + D_loss_fake

    # Train Generator
    idx = np.random.randint(0, train_imgs.shape[0], batch_size)

    Adv_loss = combined.train_on_batch(train_imgs[idx], real)

    if i % 1000 == 0:
        print('Autoencoder Loss: ', AE_loss)
        print('Discriminator Loss: ', D_loss)
        print('Adversarial Loss: ', Adv_loss)

        plot_latent_space(encoder)
Autoencoder Loss:  0.11396405845880508
Discriminator Loss:  1.3464518785476685
Adversarial Loss:  0.6401848793029785
Autoencoder Loss:  0.0478895865380764
Discriminator Loss:  1.2937771677970886
Adversarial Loss:  0.7229815125465393
Autoencoder Loss:  0.04791981354355812
Discriminator Loss:  1.22506183385849
Adversarial Loss:  0.8402771949768066
Autoencoder Loss:  0.046723414212465286
Discriminator Loss:  1.2455075979232788
Adversarial Loss:  0.8508409261703491
Autoencoder Loss:  0.04987465962767601
Discriminator Loss:  1.27111154794693
Adversarial Loss:  0.8673261404037476
Autoencoder Loss:  0.03924637660384178
Discriminator Loss:  1.2927919626235962
Adversarial Loss:  0.9007968306541443
Autoencoder Loss:  0.04301917925477028
Discriminator Loss:  1.289901316165924
Adversarial Loss:  0.8124166131019592
Autoencoder Loss:  0.04106577858328819
Discriminator Loss:  1.2537448406219482
Adversarial Loss:  0.8318484425544739
Autoencoder Loss:  0.04374618083238602
Discriminator Loss:  1.2305973768234253
Adversarial Loss:  0.9041962623596191
Autoencoder Loss:  0.042891405522823334
Discriminator Loss:  1.286674976348877
Adversarial Loss:  0.9378982782363892
In [ ]:
# Encoder
encoder = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 500, activation = 'relu', input_shape = (784,)),
    tf.keras.layers.Dense(units = 300, activation = 'relu'),
    tf.keras.layers.Dense(units = 2, activation = None)
    ])

# Decoder
decoder = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 300, activation = 'relu', input_shape = (2,)),
    tf.keras.layers.Dense(units = 500, activation = 'relu'),
    tf.keras.layers.Dense(units = 784, activation = None)
    ])

# discriminator_label
discriminator_label = tf.keras.models.Sequential([
    tf.keras.layers.Dense(units = 100, activation = 'relu', input_shape = (2 + 6 + 1,)),
    tf.keras.layers.Dense(units = 100, activation = 'relu'),
    tf.keras.layers.Dense(units = 1, activation =  'sigmoid')
    ])
In [ ]:
autoencoder = tf.keras.models.Sequential([encoder, decoder])
autoencoder.compile(optimizer = tf.keras.optimizers.Adam(0.0005),
                    loss = 'mean_squared_error')

encoder.trainable = False
discriminator_label.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0005),
                            loss = 'binary_crossentropy')

discriminator_label.trainable = False
encoder.trainable = True

combined_input = tf.keras.layers.Input(shape = (28*28 + 6 + 1,))
latent_variable = encoder(combined_input[:, :28*28])
latent_label = tf.concat([latent_variable, combined_input[:, 28*28:]], 1)
combined_label_output = discriminator_label(latent_label)
combined_label = tf.keras.models.Model(inputs = combined_input, outputs = combined_label_output)

combined_label.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0005),
                       loss = 'binary_crossentropy')
In [ ]:
n_iter = 10000
batch_size = 100

fake = np.zeros(batch_size)
real = np.ones(batch_size)

for i in range(n_iter):
    idx = np.random.randint(0, train_imgs.shape[0], batch_size)

    # Train Autoencoder
    AE_loss = autoencoder.train_on_batch(train_imgs[idx], train_imgs[idx])

    # Train Discriminator
    # positive phase
    latent_true, one_hot_true_label = gaussian_mixture_sampler(batch_size, 2, 6)
    latent_true = tf.concat([latent_true, one_hot_true_label], 1)
    D_loss_real = discriminator_label.train_on_batch(latent_true, real)

    latent_true, one_hot_true_label = gaussian_mixture_sampler(batch_size, 2, 6)
    one_hot_fake_label = tf.one_hot([6] * batch_size, 6 + 1)
    latent_true = tf.concat([latent_true, one_hot_fake_label], 1)
    D_loss_real += discriminator_label.train_on_batch(latent_true, real)

    latent_fake = encoder.predict(train_imgs[idx], verbose = 0)
    one_hot_fake_label = tf.one_hot([6] * batch_size, 6 + 1)
    latent_fake = tf.concat([latent_fake, one_hot_fake_label], 1)
    D_loss_fake = discriminator_label.train_on_batch(latent_fake, fake)

    D_loss = D_loss_real + D_loss_fake

    # Train Generator
    idx = np.random.randint(0, train_imgs.shape[0], batch_size)
    one_hot_fake_label = tf.one_hot([6] * batch_size, 6 + 1)

    train_input = tf.concat([train_imgs[idx], one_hot_fake_label], 1)

    Adv_loss = combined_label.train_on_batch(train_input, real)

    if i % 1000 == 0:
        print('Autoencoder Loss: ', AE_loss)
        print('Discriminator Loss: ', D_loss)
        print('Adversarial Loss: ', Adv_loss)

        plot_latent_space(encoder)
Autoencoder Loss:  0.10694870352745056
Discriminator Loss:  2.0750131011009216
Adversarial Loss:  0.6890000700950623
Autoencoder Loss:  0.05089759826660156
Discriminator Loss:  1.3033279494848102
Adversarial Loss:  0.7591390013694763
Autoencoder Loss:  0.044531725347042084
Discriminator Loss:  1.1924733045825633
Adversarial Loss:  0.9256393313407898
Autoencoder Loss:  0.043471693992614746
Discriminator Loss:  1.2489777825121564
Adversarial Loss:  0.8828380703926086
Autoencoder Loss:  0.04346089065074921
Discriminator Loss:  1.203024471810295
Adversarial Loss:  0.7591931819915771
Autoencoder Loss:  0.04413749650120735
Discriminator Loss:  1.1958808300436488
Adversarial Loss:  0.8612193465232849
Autoencoder Loss:  0.04201776906847954
Discriminator Loss:  1.2356403472762238
Adversarial Loss:  0.8549310564994812
Autoencoder Loss:  0.04225203022360802
Discriminator Loss:  1.2348333907218603
Adversarial Loss:  0.9510146975517273
Autoencoder Loss:  0.04253502935171127
Discriminator Loss:  1.1383356048092157
Adversarial Loss:  0.9228068590164185
Autoencoder Loss:  0.04324803501367569
Discriminator Loss:  1.1546122021292007
Adversarial Loss:  1.0049819946289062

9. Other Tutorials

In [ ]:
%%html
<center><iframe src="https://www.youtube.com/embed/9JpdAg6uMXs?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
  • CS231n: CNN for Visual Recognition
In [ ]:
%%html
<center><iframe src="https://www.youtube.com/embed/5WoItGTWV54?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

MIT by Aaron Courville

In [ ]:
%%html
<center><iframe src="https://www.youtube.com/embed/JVb54xhEw6Y?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [1]:
%%html
<center><iframe src="https://www.youtube.com/embed/odpjk7_tGY0?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [2]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')