Deep Learning for Mechanical Engineering

Homework 08

Due Monday, 11/06/2021, 4:00 PM

Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

For your handwritten solutions, please scan or take a picture of them. Alternatively, you can write them in markdown if you prefer.
Only .ipynb files will be graded for your code.
- Ensure that your NAME and student ID are included in your .ipynb files. ex) IljeokKim_20202467_HW08.ipynb
Compress all the files into a single .zip file.
- In the .zip file's name, include your NAME and student ID. ex) DogyeomPark_20202467_HW08.zip
- Submit this .zip file on KLMS
Do not submit a printed version of your code, as it will not be graded.

Problem 1: Convolutional Autoencoder (CAE)¶

In this problem, our objective is to develop a model capable of reconstructing dog images through a convolutional autoencoder architecture.

Download dog_dataset.npy

(1) Load the dog dataset.

import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive

## your code here
#

dog_dataset =

(2) Design and train a convolutional autoencoder architecture.

## your code here
#

Epoch 1/50
313/313 [==============================] - 6s 12ms/step - loss: 0.0281
Epoch 2/50
313/313 [==============================] - 4s 14ms/step - loss: 0.0120
Epoch 3/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0098
Epoch 4/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0086
Epoch 5/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0081
Epoch 6/50
313/313 [==============================] - 5s 15ms/step - loss: 0.0068
Epoch 7/50
313/313 [==============================] - 4s 11ms/step - loss: 0.0064
Epoch 8/50
313/313 [==============================] - 4s 11ms/step - loss: 0.0059
Epoch 9/50
313/313 [==============================] - 4s 13ms/step - loss: 0.0056
Epoch 10/50
313/313 [==============================] - 4s 13ms/step - loss: 0.0056
Epoch 11/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0053
Epoch 12/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0052
Epoch 13/50
313/313 [==============================] - 4s 13ms/step - loss: 0.0052
Epoch 14/50
313/313 [==============================] - 4s 13ms/step - loss: 0.0048
Epoch 15/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0049
Epoch 16/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0047
Epoch 17/50
313/313 [==============================] - 4s 14ms/step - loss: 0.0044
Epoch 18/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0044
Epoch 19/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0043
Epoch 20/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0043
Epoch 21/50
313/313 [==============================] - 4s 14ms/step - loss: 0.0042
Epoch 22/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0043
Epoch 23/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0040
Epoch 24/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0040
Epoch 25/50
313/313 [==============================] - 4s 14ms/step - loss: 0.0039
Epoch 26/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0039
Epoch 27/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0039
Epoch 28/50
313/313 [==============================] - 4s 13ms/step - loss: 0.0038
Epoch 29/50
313/313 [==============================] - 4s 13ms/step - loss: 0.0038
Epoch 30/50
313/313 [==============================] - 4s 11ms/step - loss: 0.0037
Epoch 31/50
313/313 [==============================] - 4s 11ms/step - loss: 0.0037
Epoch 32/50
313/313 [==============================] - 4s 14ms/step - loss: 0.0037
Epoch 33/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0036
Epoch 34/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0036
Epoch 35/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0035
Epoch 36/50
313/313 [==============================] - 4s 14ms/step - loss: 0.0035
Epoch 37/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0035
Epoch 38/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0035
Epoch 39/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0035
Epoch 40/50
313/313 [==============================] - 4s 14ms/step - loss: 0.0034
Epoch 41/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0034
Epoch 42/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0034
Epoch 43/50
313/313 [==============================] - 4s 13ms/step - loss: 0.0033
Epoch 44/50
313/313 [==============================] - 4s 13ms/step - loss: 0.0033
Epoch 45/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0033
Epoch 46/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0033
Epoch 47/50
313/313 [==============================] - 4s 14ms/step - loss: 0.0033
Epoch 48/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0032
Epoch 49/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0032
Epoch 50/50
313/313 [==============================] - 4s 12ms/step - loss: 0.0032

<keras.src.callbacks.History at 0x7a69502e9e40>

(3) Select five random images and create plots to display both the original and reconstructed images.

## your code here
#

WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

(4) Load data of a white dog and a black dog.

Download images of black_dog.png and white_dog.png.
Resize each image to (64, 64, 3) and rescale the image pixels to a range of 0 to 1 by dividing them by 255.

## your code here
#

(5) Walk in the latent space

Show the average pixel image of the black dog and white dog images in the original space.
Show the decoded image after averaging the encoded representations of a white dog and a black dog in the latent space.

## your code here
#

1/1 [==============================] - 0s 73ms/step
1/1 [==============================] - 0s 21ms/step

WARNING:tensorflow:5 out of the last 10 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7a6950258820> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

1/1 [==============================] - 0s 112ms/step

(6) Explain the reason for the difference between the two images in terms of the latent space.

## your code here
#

Problem 2: Segmentation¶

We studied the Fully Convolutional Network (FCN) model with the VGG16 network in class. In this problem, you will implement your FCN model using the VGG19 network as the encoder part of the model.

To achieve this, we will utilize a pre-trained VGG network in Problem 2 and then proceed with the FCN model in Problem 3.

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import cv2

(1) Load the provided dataset and display its shape.

## your code here
#

train_imgs =
train_seg =
test_imgs =

n_train = train_imgs.shape[0]
n_test = test_imgs.shape[0]

print ("The number of training images : {}, shape : {}".format(n_train, train_imgs.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_imgs.shape))

The number of training images : 289, shape : (289, 160, 576, 3)
The number of testing images : 290, shape : (290, 160, 576, 3)

(2) Visualize a randomly selected image from the training dataset.

## your code here
#

(3) Load the VGG19 network and display its model structure.

weights = 'imagenet'
include_top = False

## your code here
#

Model: "vgg19"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, None, None, 3)]   0         
                                                                 
 block1_conv1 (Conv2D)       (None, None, None, 64)    1792      
                                                                 
 block1_conv2 (Conv2D)       (None, None, None, 64)    36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, None, None, 64)    0         
                                                                 
 block2_conv1 (Conv2D)       (None, None, None, 128)   73856     
                                                                 
 block2_conv2 (Conv2D)       (None, None, None, 128)   147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, None, None, 128)   0         
                                                                 
 block3_conv1 (Conv2D)       (None, None, None, 256)   295168    
                                                                 
 block3_conv2 (Conv2D)       (None, None, None, 256)   590080    
                                                                 
 block3_conv3 (Conv2D)       (None, None, None, 256)   590080    
                                                                 
 block3_conv4 (Conv2D)       (None, None, None, 256)   590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, None, None, 256)   0         
                                                                 
 block4_conv1 (Conv2D)       (None, None, None, 512)   1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block4_conv4 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, None, None, 512)   0         
                                                                 
 block5_conv1 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_conv4 (Conv2D)       (None, None, None, 512)   2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, None, None, 512)   0         
                                                                 
=================================================================
Total params: 20024384 (76.39 MB)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 20024384 (76.39 MB)
_________________________________________________________________

Problem 3: FCN Model¶

Now that we have the pre-trained weights and biases from Problem 2, we will utilize them in this problem.

(1) Define your FCN model by incorporating the weights and biases from Problem 2.

## your code here
#

model.summary()

Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
==================================================================================================
 input_1 (InputLayer)        [(None, None, None, 3)]      0         []                            
                                                                                                  
 block1_conv1 (Conv2D)       (None, None, None, 64)       1792      ['input_1[0][0]']             
                                                                                                  
 block1_conv2 (Conv2D)       (None, None, None, 64)       36928     ['block1_conv1[0][0]']        
                                                                                                  
 block1_pool (MaxPooling2D)  (None, None, None, 64)       0         ['block1_conv2[0][0]']        
                                                                                                  
 block2_conv1 (Conv2D)       (None, None, None, 128)      73856     ['block1_pool[0][0]']         
                                                                                                  
 block2_conv2 (Conv2D)       (None, None, None, 128)      147584    ['block2_conv1[0][0]']        
                                                                                                  
 block2_pool (MaxPooling2D)  (None, None, None, 128)      0         ['block2_conv2[0][0]']        
                                                                                                  
 block3_conv1 (Conv2D)       (None, None, None, 256)      295168    ['block2_pool[0][0]']         
                                                                                                  
 block3_conv2 (Conv2D)       (None, None, None, 256)      590080    ['block3_conv1[0][0]']        
                                                                                                  
 block3_conv3 (Conv2D)       (None, None, None, 256)      590080    ['block3_conv2[0][0]']        
                                                                                                  
 block3_conv4 (Conv2D)       (None, None, None, 256)      590080    ['block3_conv3[0][0]']        
                                                                                                  
 block3_pool (MaxPooling2D)  (None, None, None, 256)      0         ['block3_conv4[0][0]']        
                                                                                                  
 block4_conv1 (Conv2D)       (None, None, None, 512)      1180160   ['block3_pool[0][0]']         
                                                                                                  
 block4_conv2 (Conv2D)       (None, None, None, 512)      2359808   ['block4_conv1[0][0]']        
                                                                                                  
 block4_conv3 (Conv2D)       (None, None, None, 512)      2359808   ['block4_conv2[0][0]']        
                                                                                                  
 block4_conv4 (Conv2D)       (None, None, None, 512)      2359808   ['block4_conv3[0][0]']        
                                                                                                  
 block4_pool (MaxPooling2D)  (None, None, None, 512)      0         ['block4_conv4[0][0]']        
                                                                                                  
 block5_conv1 (Conv2D)       (None, None, None, 512)      2359808   ['block4_pool[0][0]']         
                                                                                                  
 conv6 (Conv2D)              (None, None, None, 4096)     1027645   ['block5_conv1[0][0]']        
                                                          44                                      
                                                                                                  
 fcn4 (Conv2D)               (None, None, None, 4096)     1678131   ['conv6[0][0]']               
                                                          2                                       
                                                                                                  
 fcn3 (Conv2D)               (None, None, None, 2)        8194      ['fcn4[0][0]']                
                                                                                                  
 conv2d_transpose_4 (Conv2D  (None, None, None, 512)      16896     ['fcn3[0][0]']                
 Transpose)                                                                                       
                                                                                                  
 tf.__operators__.add (TFOp  (None, None, None, 512)      0         ['conv2d_transpose_4[0][0]',  
 Lambda)                                                             'block4_conv1[0][0]']        
                                                                                                  
 conv2d_transpose_5 (Conv2D  (None, None, None, 256)      2097408   ['tf.__operators__.add[0][0]']
 Transpose)                                                                                       
                                                                                                  
 tf.__operators__.add_1 (TF  (None, None, None, 256)      0         ['conv2d_transpose_5[0][0]',  
 OpLambda)                                                           'block3_conv1[0][0]']        
                                                                                                  
 conv2d_transpose_6 (Conv2D  (None, None, None, 2)        131074    ['tf.__operators__.add_1[0][0]
 Transpose)                                                         ']                            
                                                                                                  
==================================================================================================
Total params: 134744388 (514.01 MB)
Trainable params: 121799428 (464.63 MB)
Non-trainable params: 12944960 (49.38 MB)
__________________________________________________________________________________________________

(2) Train the model. Highly recommand to use GPU or CoLab. (Train on CPU may take more than half an hour.)

## your code here
#

Epoch 1/5
73/73 [==============================] - 40s 434ms/step - loss: 3.6358 - accuracy: 0.8647
Epoch 2/5
73/73 [==============================] - 30s 405ms/step - loss: 0.4007 - accuracy: 0.9356
Epoch 3/5
73/73 [==============================] - 29s 401ms/step - loss: 0.1504 - accuracy: 0.9490
Epoch 4/5
73/73 [==============================] - 29s 403ms/step - loss: 0.1190 - accuracy: 0.9556
Epoch 5/5
73/73 [==============================] - 30s 407ms/step - loss: 0.1115 - accuracy: 0.9586

<keras.src.callbacks.History at 0x7d6ec17283d0>

(4) Test your model by selecting a random test image, segmenting it using your trained FCN model, and then plotting the segmentation and the test image together.

## your code here
#

1/1 [==============================] - 0s 204ms/step

(5) Now that we can segment images, let's proceed to segment the provided highway image.

download highway image highway.png
print image shape (h, w, c)
segment it by your FCN model

## your code here
#

(320, 512, 3)

## your code here
#

WARNING:tensorflow:5 out of the last 6 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7d6ec15fc9d0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

(6) As you can see, the trained image and image in problem 3-(5) exhibit distinct shapes. Is it possible to feed different shaped images to the same FCN model without reshaping it? If yes, explain why it is possible.

## your code here
#