Problem 1: Convolutional Autoencoder (CAE)

In this problem, our objective is to develop a model capable of reconstructing dog images through a convolutional autoencoder architecture.

(1) Load the dog dataset.

import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from google.colab import drive
Mounted at /content/drive
## your code here

dog_dataset =

(2) Design and train a convolutional autoencoder architecture.

## your code here
(3) Select five random images and create plots to display both the original and reconstructed images.

## your code here
(4) Load data of a white dog and a black dog.

  • Download images of black_dog.png and white_dog.png.

  • Resize each image to (64, 64, 3) and rescale the image pixels to a range of 0 to 1 by dividing them by 255.

## your code here

(5) Walk in the latent space

  • Show the average pixel image of the black dog and white dog images in the original space.

  • Show the decoded image after averaging the encoded representations of a white dog and a black dog in the latent space.

## your code here
(6) Explain the reason for the difference between the two images in terms of the latent space.

## your code here

Problem 2: Segmentation

We studied the Fully Convolutional Network (FCN) model with the VGG16 network in class. In this problem, you will implement your FCN model using the VGG19 network as the encoder part of the model.

To achieve this, we will utilize a pre-trained VGG network in Problem 2 and then proceed with the FCN model in Problem 3.

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import cv2

(1) Load the provided dataset and display its shape.

## your code here

train_imgs =
train_seg =
test_imgs =

n_train = train_imgs.shape[0]
n_test = test_imgs.shape[0]

print ("The number of training images : {}, shape : {}".format(n_train, train_imgs.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_imgs.shape))
The number of training images : 289, shape : (289, 160, 576, 3)
The number of testing images : 290, shape : (290, 160, 576, 3)

(2) Visualize a randomly selected image from the training dataset.

## your code here

(3) Load the VGG19 network and display its model structure.

  • weights = 'imagenet'
  • include_top = False
## your code here
Model: "vgg19"
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, None, None, 3)]   0         
 block1_conv1 (Conv2D)       (None, None, None, 64)    1792      
 block1_conv2 (Conv2D)       (None, None, None, 64)    36928     
 block1_pool (MaxPooling2D)  (None, None, None, 64)    0         
 block2_conv1 (Conv2D)       (None, None, None, 128)   73856     
 block2_conv2 (Conv2D)       (None, None, None, 128)   147584    
 block2_pool (MaxPooling2D)  (None, None, None, 128)   0         
 block3_conv1 (Conv2D)       (None, None, None, 256)   295168    
 block3_conv2 (Conv2D)       (None, None, None, 256)   590080    
 block3_conv3 (Conv2D)       (None, None, None, 256)   590080    
 block3_conv4 (Conv2D)       (None, None, None, 256)   590080    
 block3_pool (MaxPooling2D)  (None, None, None, 256)   0         
 block4_conv1 (Conv2D)       (None, None, None, 512)   1180160   
 block4_conv2 (Conv2D)       (None, None, None, 512)   2359808   
 block4_conv3 (Conv2D)       (None, None, None, 512)   2359808   
 block4_conv4 (Conv2D)       (None, None, None, 512)   2359808   
 block4_pool (MaxPooling2D)  (None, None, None, 512)   0         
 block5_conv1 (Conv2D)       (None, None, None, 512)   2359808   
 block5_conv2 (Conv2D)       (None, None, None, 512)   2359808   
 block5_conv3 (Conv2D)       (None, None, None, 512)   2359808   
 block5_conv4 (Conv2D)       (None, None, None, 512)   2359808   
 block5_pool (MaxPooling2D)  (None, None, None, 512)   0         
Total params: 20024384 (76.39 MB)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 20024384 (76.39 MB)

Problem 3: FCN Model

Now that we have the pre-trained weights and biases from Problem 2, we will utilize them in this problem.

(1) Define your FCN model by incorporating the weights and biases from Problem 2.

## your code here
Model: "model_1"
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_1 (InputLayer)        [(None, None, None, 3)]      0         []                            
 block1_conv1 (Conv2D)       (None, None, None, 64)       1792      ['input_1[0][0]']             
 block1_conv2 (Conv2D)       (None, None, None, 64)       36928     ['block1_conv1[0][0]']        
 block1_pool (MaxPooling2D)  (None, None, None, 64)       0         ['block1_conv2[0][0]']        
 block2_conv1 (Conv2D)       (None, None, None, 128)      73856     ['block1_pool[0][0]']         
 block2_conv2 (Conv2D)       (None, None, None, 128)      147584    ['block2_conv1[0][0]']        
 block2_pool (MaxPooling2D)  (None, None, None, 128)      0         ['block2_conv2[0][0]']        
 block3_conv1 (Conv2D)       (None, None, None, 256)      295168    ['block2_pool[0][0]']         
 block3_conv2 (Conv2D)       (None, None, None, 256)      590080    ['block3_conv1[0][0]']        
 block3_conv3 (Conv2D)       (None, None, None, 256)      590080    ['block3_conv2[0][0]']        
 block3_conv4 (Conv2D)       (None, None, None, 256)      590080    ['block3_conv3[0][0]']        
 block3_pool (MaxPooling2D)  (None, None, None, 256)      0         ['block3_conv4[0][0]']        
 block4_conv1 (Conv2D)       (None, None, None, 512)      1180160   ['block3_pool[0][0]']         
 block4_conv2 (Conv2D)       (None, None, None, 512)      2359808   ['block4_conv1[0][0]']        
 block4_conv3 (Conv2D)       (None, None, None, 512)      2359808   ['block4_conv2[0][0]']        
 block4_conv4 (Conv2D)       (None, None, None, 512)      2359808   ['block4_conv3[0][0]']        
 block4_pool (MaxPooling2D)  (None, None, None, 512)      0         ['block4_conv4[0][0]']        
 block5_conv1 (Conv2D)       (None, None, None, 512)      2359808   ['block4_pool[0][0]']         
 conv6 (Conv2D)              (None, None, None, 4096)     1027645   ['block5_conv1[0][0]']        
 fcn4 (Conv2D)               (None, None, None, 4096)     1678131   ['conv6[0][0]']               
 fcn3 (Conv2D)               (None, None, None, 2)        8194      ['fcn4[0][0]']                
 conv2d_transpose_4 (Conv2D  (None, None, None, 512)      16896     ['fcn3[0][0]']                
 tf.__operators__.add (TFOp  (None, None, None, 512)      0         ['conv2d_transpose_4[0][0]',  
 Lambda)                                                             'block4_conv1[0][0]']        
 conv2d_transpose_5 (Conv2D  (None, None, None, 256)      2097408   ['tf.__operators__.add[0][0]']
 tf.__operators__.add_1 (TF  (None, None, None, 256)      0         ['conv2d_transpose_5[0][0]',  
 OpLambda)                                                           'block3_conv1[0][0]']        
 conv2d_transpose_6 (Conv2D  (None, None, None, 2)        131074    ['tf.__operators__.add_1[0][0]
 Transpose)                                                         ']                            
Total params: 134744388 (514.01 MB)
Trainable params: 121799428 (464.63 MB)
Non-trainable params: 12944960 (49.38 MB)

(2) Train the model. Highly recommand to use GPU or CoLab. (Train on CPU may take more than half an hour.)

## your code here
Epoch 1/5
73/73 [==============================] - 40s 434ms/step - loss: 3.6358 - accuracy: 0.8647
Epoch 2/5
73/73 [==============================] - 30s 405ms/step - loss: 0.4007 - accuracy: 0.9356
Epoch 3/5
73/73 [==============================] - 29s 401ms/step - loss: 0.1504 - accuracy: 0.9490
Epoch 4/5
73/73 [==============================] - 29s 403ms/step - loss: 0.1190 - accuracy: 0.9556
Epoch 5/5
73/73 [==============================] - 30s 407ms/step - loss: 0.1115 - accuracy: 0.9586
Out[ ]:
(4) Test your model by selecting a random test image, segmenting it using your trained FCN model, and then plotting the segmentation and the test image together.

## your code here
1/1 [==============================] - 0s 204ms/step

(5) Now that we can segment images, let's proceed to segment the provided highway image.

  • download highway image highway.png
  • print image shape (h, w, c)
  • segment it by your FCN model
## your code here
(320, 512, 3)
## your code here
(6) As you can see, the trained image and image in problem 3-(5) exhibit distinct shapes. Is it possible to feed different shaped images to the same FCN model without reshaping it? If yes, explain why it is possible.

## your code here