Fully Convolutional Networks (FCN)


By Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

Table of Contents


1. Convolutional Autoencoder

In [2]:
from IPython.display import YouTubeVideo
YouTubeVideo('sKDv7yp3Jdk?si=zflwWs4XaLljHfNC', width = "560", height = "315")
Out[2]:

1.1. 2D Convolution

tf.keras.layers.Conv2D(filters, kernel_size, strides, padding, activation, kernel_regularizer, input_shape)
    filters = 32
    kernel_size = (3,3)
    strides = (1,1)
    padding = 'SAME'
    activeation='relu'
    kernel_regularizer=tf.keras.regularizers.l2(0.04)
    input_shape = tensor of shape([input_h, input_w, input_ch])

  • filter size

    • the number of channels.
  • kernel_size

    • the height and width of the 2D convolution window.
  • stride

    • the step size of the kernel when traversing the image.
  • padding

    • how the border of a sample is handled.
    • A padded convolution will keep the spatial output dimensions equal to the input, whereas unpadded convolutions will crop away some of the borders if the kernel is larger than 1.
    • 'SAME' : enable zero padding
    • 'VALID' : disable zero padding
  • activation

    • Activation function to use.
  • kernel_regularizer

    • Initializer for the kernel weights matrix.
  • input and output channels

    • A convolutional layer takes a certain number of input channels ($C$) and calculates a specific number of output channels ($D$).

Examples


input = [None, 4, 4, 1]
filter size = [3, 3, 1, 1]
strides = [1, 1, 1, 1]
padding = 'VALID'
input = [None, 5, 5, 1]
filter size = [3, 3, 1, 1]
strides = [1, 1, 1, 1]
padding = 'SAME'

1.2. Transposed Convolution


  • The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. For instance, one might use such a transformation as the decoding layer of a convolutional autoencoder or to project feature maps to a higher-dimensional space.

  • Some sources use the name deconvolution, which is inappropriate because it’s not a deconvolution. To make things worse deconvolutions do exists, but they’re not common in the field of deep learning.

  • An actual deconvolution reverts the process of a convolution.

  • Imagine inputting an image into a single convolutional layer. Now take the output, throw it into a black box and out comes your original image again. This black box does a deconvolution. It is the mathematical inverse of what a convolutional layer does.

  • A transposed convolution is somewhat similar because it produces the same spatial resolution a hypothetical deconvolutional layer would. However, the actual mathematical operation that’s being performed on the values is different.

  • A transposed convolutional layer carries out a regular convolution but reverts its spatial transformation.


tf.keras.layers.Conv2DTranspose(filters, kernel_size, strides, padding = 'SAME', activation)
    filter = number of channels/ 64
    kernel_size = tensor of shape (3,3)
    strides = stride of the sliding window for each dimension of the input tensor
    padding = 'SAME'
    activation = activation functions('softmax', 'relu' ...)

  • 'SAME' : enable zero padding
  • 'VALID' : disable zero padding

An image of 5x5 is fed into a convolutional layer. The stride is set to 2, the padding is deactivated and the kernel is 3x3. This results in a 2x2 image.

2D convolution with no padding, no stride and kernel of 3

If we wanted to reverse this process, we’d need the inverse mathematical operation so that 9 values are generated from each pixel we input. Afterward, we traverse the output image with a stride of 2. This would be a deconvolution.

A transposed convolution does not do that. The only thing in common is it guarantees that the output will be a 5x5 image as well, while still performing a normal convolution operation. To achieve this, we need to perform some fancy padding on the input.

Transposed 2D convolution with no padding, stride of 2 and kernel of 3

It merely reconstructs the spatial resolution from before and performs a convolution. This may not be the mathematical inverse, but for Encoder-Decoder architectures, it’s still very helpful. This way we can combine the upscaling of an image with a convolution, instead of doing two separate processes.

  • Another example of transposed convolution
Transposed 2D convolution with no padding, no stride and kernel of 3

Strides and padding for transposed convolution (optional)














1.3. Examples





  • A transposed 2-D convolution layer upsamples feature maps.

  • This layer is sometimes incorrectly known as a "deconvolution" or "deconv" layer. This layer is the transpose of convolution and does not perform deconvolution.

In [ ]:
%%html
<iframe src="https://www.youtube.com/embed/nTt_ajul8NY?start=725"
width="560" height="315" frameborder="0" allowfullscreen></iframe>

1.4. CAE with MNIST

Import Library

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline

Load MNIST Data

In [ ]:
# Load Data

mnist = tf.keras.datasets.mnist
(train_imgs, train_labels), (test_imgs, test_labels) = mnist.load_data()

train_imgs, test_imgs = train_imgs/255.0, test_imgs/255.0
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step
  • Only use (1, 5, 6) digits to visualize latent space in 2-D
In [ ]:
# Use Only 1,5,6 Digits to Visualize

train_x = train_imgs[np.hstack([np.where(train_labels == 1),
                                np.where(train_labels == 5),
                                np.where(train_labels == 6)])][0]
train_y = train_labels[np.hstack([np.where(train_labels == 1),
                                  np.where(train_labels == 5),
                                  np.where(train_labels == 6)])][0]
test_x = test_imgs[np.hstack([np.where(test_labels == 1),
                              np.where(test_labels == 5),
                              np.where(test_labels == 6)])][0]
test_y = test_labels[np.hstack([np.where(test_labels == 1),
                                np.where(test_labels == 5),
                                np.where(test_labels == 6)])][0]
In [ ]:
train_x = train_x.reshape(-1,28,28,1)
test_x = test_x.reshape(-1,28,28,1)

The following architecture has been implemented.





Build a Model

In [ ]:
encoder = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32,
                           kernel_size = (3,3),
                           strides = (2,2),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (28, 28, 1)),

    tf.keras.layers.Conv2D(filters = 64,
                           kernel_size = (3,3),
                           strides = (2,2),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (14, 14, 32)),

    tf.keras.layers.Conv2D(filters = 2,
                           kernel_size = (7,7),
                           padding = 'VALID',
                           input_shape = (7,7,64))
])





In [ ]:
decoder = tf.keras.models.Sequential([
    tf.keras.layers.Conv2DTranspose(filters = 64,
                                    kernel_size = (7,7),
                                    strides = (1,1),
                                    activation = 'relu',
                                    padding = 'VALID',
                                    input_shape = (1, 1, 2)),

    tf.keras.layers.Conv2DTranspose(filters = 32,
                                    kernel_size = (3,3),
                                    strides = (2,2),
                                    activation = 'relu',
                                    padding = 'SAME',
                                    input_shape = (7, 7, 64)),

    tf.keras.layers.Conv2DTranspose(filters = 1,
                                    kernel_size = (7,7),
                                    strides = (2,2),
                                    padding = 'SAME',
                                    input_shape = (14,14,32))
])
In [ ]:
latent = encoder.output
result = decoder(latent)
In [ ]:
model = tf.keras.Model(inputs = encoder.input, outputs = result)

Define Loss and Optimizer

In [ ]:
model.compile(optimizer = 'adam',
              loss = 'mean_squared_error')

Define Optimization Configuration and Then Optimize

In [ ]:
model.fit(train_x, train_x, epochs = 10)
Epoch 1/10
566/566 [==============================] - 15s 6ms/step - loss: 0.0435
Epoch 2/10
566/566 [==============================] - 3s 4ms/step - loss: 0.0340
Epoch 3/10
566/566 [==============================] - 3s 6ms/step - loss: 0.0320
Epoch 4/10
566/566 [==============================] - 4s 7ms/step - loss: 0.0309
Epoch 5/10
566/566 [==============================] - 4s 8ms/step - loss: 0.0302
Epoch 6/10
566/566 [==============================] - 4s 7ms/step - loss: 0.0295
Epoch 7/10
566/566 [==============================] - 5s 9ms/step - loss: 0.0290
Epoch 8/10
566/566 [==============================] - 4s 8ms/step - loss: 0.0287
Epoch 9/10
566/566 [==============================] - 4s 8ms/step - loss: 0.0283
Epoch 10/10
566/566 [==============================] - 5s 8ms/step - loss: 0.0281
Out[ ]:
<keras.src.callbacks.History at 0x785ab231c490>
In [ ]:
test_img = test_x[[6]]
x_reconst = model.predict(test_img)

plt.figure(figsize = (6, 4))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28,28), 'gray')
plt.title('Input image', fontsize = 15)
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(x_reconst.reshape(28,28), 'gray')
plt.title('Reconstructed image', fontsize = 15)
plt.axis('off')
plt.show()
1/1 [==============================] - 0s 123ms/step
In [ ]:
idx = np.random.choice(test_y.shape[0], 500)
rnd_x, rnd_y = test_x[idx], test_y[idx]
In [ ]:
rnd_latent = encoder.predict(rnd_x)
rnd_latent = rnd_latent.reshape(-1,2)

plt.figure(figsize = (6, 6))
plt.scatter(rnd_latent[rnd_y == 1, 0], rnd_latent[rnd_y == 1, 1], label = '1')
plt.scatter(rnd_latent[rnd_y == 5, 0], rnd_latent[rnd_y == 5, 1], label = '5')
plt.scatter(rnd_latent[rnd_y == 6, 0], rnd_latent[rnd_y == 6, 1], label = '6')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(fontsize = 15)
plt.axis('equal')
plt.show()
16/16 [==============================] - 0s 8ms/step
In [ ]:
new_latent = np.array([[-8, 0]]).reshape(-1,1,1,2)

fake_img = decoder.predict(new_latent)

plt.figure(figsize = (9, 4))
plt.subplot(1,2,1)
plt.scatter(rnd_latent[rnd_y == 1, 0], rnd_latent[rnd_y == 1, 1], label = '1')
plt.scatter(rnd_latent[rnd_y == 5, 0], rnd_latent[rnd_y == 5, 1], label = '5')
plt.scatter(rnd_latent[rnd_y == 6, 0], rnd_latent[rnd_y == 6, 1], label = '6')
plt.scatter(new_latent[:,:,:,0], new_latent[:,:,:,1], c = 'k', marker = 'o', s = 200, label = 'new data')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(loc = 2, fontsize = 12)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(fake_img.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()
1/1 [==============================] - 0s 29ms/step

2. Fully Convolutional Networks for Segmentation

In [3]:
from IPython.display import YouTubeVideo
YouTubeVideo('sKDv7yp3Jdk?si=msi2nCF34udI3bUj&amp;start=1230', width = "560", height = "315")
Out[3]:

2.1. Segmentation


  • Segmentation task is different from classification task because it requires predicting a class for each pixel of the input image, instead of only 1 class for the whole input.

  • Classification needs to understand what is in the input (namely, the context).

  • However, in order to predict what is in the input for each pixel, segmentation needs to recover not only what is in the input, but also where.

  • Segment images into regions with different semantic categories. These semantic regions label and predict objects at the pixel level

2.2. Fully Convolutional Networks (FCN)


  • FCN is built only from locally connected layers, such as convolution, pooling and upsampling.

  • Note that no dense layer is used in this kind of architecture.

  • Network can work regardless of the original image size, without requiring any fixed number of units at any stage.

  • To obtain a segmentation map (output), segmentation networks usually have 2 parts

    • Downsampling path: capture semantic/contextual information
    • Upsampling path: recover spatial information
  • The downsampling path is used to extract and interpret the context (what), while the upsampling path is used to enable precise localization (where).

  • Furthermore, to fully recover the fine-grained spatial information lost in the pooling or downsampling layers, we often use skip connections.

  • Given a position on the spatial dimension, the output of the channel dimension will be a category prediction of the pixel corresponding to the location.

2.3. Supervised Learning for Segmentation

2.3.1. Segmented (Labeled) Images

Download data

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
In [ ]:
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
In [ ]:
seg_train_imgs = np.load('/content/drive/MyDrive/DL/DL_data/seg_train_imgs.npy')
seg_train_labels = np.load('/content/drive/MyDrive/DL/DL_data/seg_train_labels.npy')
seg_test_imgs = np.load('/content/drive/MyDrive/DL/DL_data/seg_test_imgs.npy')

n_train = seg_train_imgs.shape[0]
n_test = seg_train_imgs.shape[0]

print ("The number of training images  : {}, shape : {}".format(n_train, seg_train_imgs.shape))
print ("The number of segmented images : {}, shape : {}".format(n_train, seg_train_labels.shape))
print ("The number of testing images   : {}, shape : {}".format(n_test, seg_test_imgs.shape))
The number of training images  : 180, shape : (180, 224, 224, 3)
The number of segmented images : 180, shape : (180, 224, 224, 2)
The number of testing images   : 180, shape : (27, 224, 224, 3)
In [ ]:
## binary segmentation and one-hot encoding in this case

idx = np.random.randint(n_train)

plt.figure(figsize = (10, 4))
plt.subplot(1,3,1)
plt.imshow(seg_train_imgs[idx])
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(seg_train_labels[idx][:,:,0])
plt.axis('off')
plt.subplot(1,3,3)
plt.imshow(seg_train_labels[idx][:,:,1])
plt.axis('off')
plt.show()

2.3.2. From CAE to FCN


  • CAE






  • FCN
    • VGG16
    • Skip connections to fully recover the fine-grained spatial information lost in the pooling or downsampling layers





2.4. FCN Implementation





Utilize VGG16 Model for Encoder

In [ ]:
model_type = tf.keras.applications.vgg16
base_model = model_type.VGG16()
base_model.trainable = False
base_model.summary()
Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0         
                                                                 
 block3_conv1 (Conv2D)       (None, 56, 56, 256)       295168    
                                                                 
 block3_conv2 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_conv3 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, 28, 28, 256)       0         
                                                                 
 block4_conv1 (Conv2D)       (None, 28, 28, 512)       1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, 14, 14, 512)       0         
                                                                 
 block5_conv1 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 7, 7, 512)         0         
                                                                 
 flatten (Flatten)           (None, 25088)             0         
                                                                 
 fc1 (Dense)                 (None, 4096)              102764544 
                                                                 
 fc2 (Dense)                 (None, 4096)              16781312  
                                                                 
 predictions (Dense)         (None, 1000)              4097000   
                                                                 
=================================================================
Total params: 138357544 (527.79 MB)
Trainable params: 0 (0.00 Byte)
Non-trainable params: 138357544 (527.79 MB)
_________________________________________________________________

Build a FCN Model

  • tf.layers are used to define upsampling parts





In [ ]:
map5 = base_model.layers[-5].output

# sixth convolution layer
conv6 = tf.keras.layers.Conv2D(filters = 4096,
                               kernel_size = (7,7),
                               padding = 'SAME',
                               activation = 'relu')(map5)

# 1x1 convolution layers
fcn4 = tf.keras.layers.Conv2D(filters = 4096,
                              kernel_size = (1,1),
                              padding = 'SAME',
                              activation = 'relu')(conv6)

fcn3 = tf.keras.layers.Conv2D(filters = 2,
                              kernel_size = (1,1),
                              padding = 'SAME',
                              activation = 'relu')(fcn4)

# Upsampling layers
fcn2 =  tf.keras.layers.Conv2DTranspose(filters = 512,
                                        kernel_size = (4,4),
                                        strides = (2,2),
                                        padding = 'SAME')(fcn3)

fcn1 =  tf.keras.layers.Conv2DTranspose(filters = 256,
                                        kernel_size = (4,4),
                                        strides = (2,2),
                                        padding = 'SAME')(fcn2 + base_model.layers[14].output)

output =  tf.keras.layers.Conv2DTranspose(filters = 2,
                                          kernel_size = (16,16),
                                          strides = (8,8),
                                          padding = 'SAME',
                                          activation = 'softmax')(fcn1 + base_model.layers[10].output)

model = tf.keras.Model(inputs = base_model.inputs, outputs = output)
In [ ]:
model.summary()
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
==================================================================================================
 input_1 (InputLayer)        [(None, 224, 224, 3)]        0         []                            
                                                                                                  
 block1_conv1 (Conv2D)       (None, 224, 224, 64)         1792      ['input_1[0][0]']             
                                                                                                  
 block1_conv2 (Conv2D)       (None, 224, 224, 64)         36928     ['block1_conv1[0][0]']        
                                                                                                  
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)         0         ['block1_conv2[0][0]']        
                                                                                                  
 block2_conv1 (Conv2D)       (None, 112, 112, 128)        73856     ['block1_pool[0][0]']         
                                                                                                  
 block2_conv2 (Conv2D)       (None, 112, 112, 128)        147584    ['block2_conv1[0][0]']        
                                                                                                  
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)          0         ['block2_conv2[0][0]']        
                                                                                                  
 block3_conv1 (Conv2D)       (None, 56, 56, 256)          295168    ['block2_pool[0][0]']         
                                                                                                  
 block3_conv2 (Conv2D)       (None, 56, 56, 256)          590080    ['block3_conv1[0][0]']        
                                                                                                  
 block3_conv3 (Conv2D)       (None, 56, 56, 256)          590080    ['block3_conv2[0][0]']        
                                                                                                  
 block3_pool (MaxPooling2D)  (None, 28, 28, 256)          0         ['block3_conv3[0][0]']        
                                                                                                  
 block4_conv1 (Conv2D)       (None, 28, 28, 512)          1180160   ['block3_pool[0][0]']         
                                                                                                  
 block4_conv2 (Conv2D)       (None, 28, 28, 512)          2359808   ['block4_conv1[0][0]']        
                                                                                                  
 block4_conv3 (Conv2D)       (None, 28, 28, 512)          2359808   ['block4_conv2[0][0]']        
                                                                                                  
 block4_pool (MaxPooling2D)  (None, 14, 14, 512)          0         ['block4_conv3[0][0]']        
                                                                                                  
 block5_conv1 (Conv2D)       (None, 14, 14, 512)          2359808   ['block4_pool[0][0]']         
                                                                                                  
 block5_conv2 (Conv2D)       (None, 14, 14, 512)          2359808   ['block5_conv1[0][0]']        
                                                                                                  
 block5_conv3 (Conv2D)       (None, 14, 14, 512)          2359808   ['block5_conv2[0][0]']        
                                                                                                  
 block5_pool (MaxPooling2D)  (None, 7, 7, 512)            0         ['block5_conv3[0][0]']        
                                                                                                  
 conv2d (Conv2D)             (None, 7, 7, 4096)           1027645   ['block5_pool[0][0]']         
                                                          44                                      
                                                                                                  
 conv2d_1 (Conv2D)           (None, 7, 7, 4096)           1678131   ['conv2d[0][0]']              
                                                          2                                       
                                                                                                  
 conv2d_2 (Conv2D)           (None, 7, 7, 2)              8194      ['conv2d_1[0][0]']            
                                                                                                  
 conv2d_transpose (Conv2DTr  (None, 14, 14, 512)          16896     ['conv2d_2[0][0]']            
 anspose)                                                                                         
                                                                                                  
 tf.__operators__.add (TFOp  (None, 14, 14, 512)          0         ['conv2d_transpose[0][0]',    
 Lambda)                                                             'block4_pool[0][0]']         
                                                                                                  
 conv2d_transpose_1 (Conv2D  (None, 28, 28, 256)          2097408   ['tf.__operators__.add[0][0]']
 Transpose)                                                                                       
                                                                                                  
 tf.__operators__.add_1 (TF  (None, 28, 28, 256)          0         ['conv2d_transpose_1[0][0]',  
 OpLambda)                                                           'block3_pool[0][0]']         
                                                                                                  
 conv2d_transpose_2 (Conv2D  (None, 224, 224, 2)          131074    ['tf.__operators__.add_1[0][0]
 Transpose)                                                         ']                            
                                                                                                  
==================================================================================================
Total params: 136514116 (520.76 MB)
Trainable params: 121799428 (464.63 MB)
Non-trainable params: 14714688 (56.13 MB)
__________________________________________________________________________________________________

Training

In [ ]:
model.compile(optimizer = 'adam',
              loss = 'categorical_crossentropy',
              metrics = ['accuracy'])
In [ ]:
model.fit(seg_train_imgs, seg_train_labels, batch_size = 5, epochs = 5)
Epoch 1/5
36/36 [==============================] - 21s 206ms/step - loss: 0.4534 - accuracy: 0.8714
Epoch 2/5
36/36 [==============================] - 7s 200ms/step - loss: 0.2277 - accuracy: 0.9138
Epoch 3/5
36/36 [==============================] - 7s 200ms/step - loss: 0.2077 - accuracy: 0.9197
Epoch 4/5
36/36 [==============================] - 7s 200ms/step - loss: 0.2020 - accuracy: 0.9220
Epoch 5/5
36/36 [==============================] - 7s 201ms/step - loss: 0.1909 - accuracy: 0.9260
Out[ ]:
<keras.src.callbacks.History at 0x7fcf20508fa0>

Testing

In [ ]:
test_img = seg_test_imgs[[1]]
test_segmented = model.predict(test_img)

seg_mask = (test_segmented[:,:,:,1] > 0.5).reshape(224, 224, 1).astype(float)

plt.figure(figsize = (8,8))
plt.subplot(2,2,1)
plt.imshow(test_img[0])
plt.axis('off')
plt.subplot(2,2,2)
plt.imshow(seg_mask, cmap = 'Blues')
plt.axis('off')
plt.subplot(2,2,3)
plt.imshow(test_img[0])
plt.imshow(seg_mask, cmap = 'Blues', alpha = 0.5)
plt.axis('off')
plt.show()
1/1 [==============================] - 1s 1s/step

3. Super-resolution and Deblurring

In [4]:
from IPython.display import YouTubeVideo
YouTubeVideo('7h91Q94E7aw?si=_jEnWdl_Hw3hBx90&amp;start=511', width = "560", height = "315")
Out[4]:

3.1. Image Restoration





  • Image restoration tries to recover original image from degraded one with prior knowledge of degradation process.

  • The sources of corruption in digital images arise during image acquisition (digitization) and transmission.

    • Imaging sensors can be affected by ambient conditions.
    • Interference can be added to an image during transmission.

3.2. Inverse Problem




  • The reconstruction is the inverse of the acquisition.

  • Inverse problems involve modeling of degradation and applying the inverse process in order to recover the original image from inadequate observations.

  • The observations contain incomplete information about the target parameter or data due to physical limitations of the measurement devices.

  • Consequently, solutions to inverse problems are non-unique.

3.3. Image Super-resolution

  • Restore High Resolution (HR) image from Low Resolution (LR) image




  • There are numerous learning-based SR approaches.

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
In [ ]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [ ]:
train_lr = np.load('/content/drive/MyDrive/DL/DL_data/SR_train_lr.npy')
train_hr = np.load('/content/drive/MyDrive/DL/DL_data/SR_train_hr.npy')
test_lr = np.load('/content/drive/MyDrive/DL/DL_data/SR_test_lr.npy')

n_train = train_lr.shape[0]
n_test = test_lr.shape[0]

print ("The number of training LR images : {}, shape : {}".format(n_train, train_lr.shape))
print ("The number of training HR images : {}, shape : {}".format(n_train, train_hr.shape))
print ("The number of testing LR images  : {}, shape : {}".format(n_test, test_lr.shape))
The number of training LR images : 79, shape : (79, 112, 112, 1)
The number of training HR images : 79, shape : (79, 224, 224, 1)
The number of testing LR images  : 20, shape : (20, 112, 112, 1)
In [ ]:
idx = np.random.randint(n_train)

plt.figure(figsize = (8, 6))
plt.subplot(1,2,1)
plt.imshow(train_lr[idx][:,:,0], 'gray')
plt.title('Low-resolution image')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(train_hr[idx][:,:,0], 'gray')
plt.title('High-resolution image')
plt.axis('off')
plt.show()

Build a FCN Model



In [ ]:
inputs = tf.keras.Input(shape = (112, 112, 1))

# 3x3 convolutional layer
x = tf.keras.layers.Conv2D(filters = 16,
                           kernel_size = (3,3),
                           padding = 'SAME',
                           activation = 'relu')(inputs)

# first residual block
x_skip = x
x = tf.keras.layers.Conv2D(filters = 16,
                           kernel_size = (3,3),
                           padding = 'SAME',
                           activation = 'relu')(x)

x = tf.keras.layers.Conv2D(filters = 16,
                           kernel_size = (3,3),
                           padding = 'SAME',
                           activation = 'relu')(x)

x = tf.keras.layers.Add()([x_skip, x])

# second residual block
x_skip = x
x = tf.keras.layers.Conv2D(filters = 16,
                           kernel_size = (3,3),
                           padding = 'SAME',
                           activation = 'relu')(x)

x = tf.keras.layers.Conv2D(filters = 16,
                           kernel_size = (3,3),
                           padding = 'SAME',
                           activation = 'relu')(x)

x = tf.keras.layers.Add()([x_skip, x])

# third residual block
x_skip = x
x = tf.keras.layers.Conv2D(filters = 16,
                           kernel_size = (3,3),
                           padding = 'SAME',
                           activation = 'relu')(x)

x = tf.keras.layers.Conv2D(filters = 16,
                           kernel_size = (3,3),
                           padding = 'SAME',
                           activation = 'relu')(x)

x = tf.keras.layers.Add()([x_skip, x])

# upsampling layer
x = tf.keras.layers.Conv2DTranspose(filters = 16,
                                    kernel_size = (4,4),
                                    strides = (2,2),
                                    padding = 'SAME',
                                    activation = 'relu')(x)

# 3x3 convolutional layer
outputs = tf.keras.layers.Conv2D(filters = 1,
                                 kernel_size = (3,3),
                                 padding = 'SAME',
                                 activation = 'sigmoid')(x)

model = tf.keras.Model(inputs, outputs)

Training

In [ ]:
model.compile(optimizer = 'adam',
              loss = 'mean_absolute_error',
              metrics = ['mean_squared_error'])
In [ ]:
model.fit(train_lr, train_hr, batch_size = 16, epochs = 30)
Epoch 1/30
5/5 [==============================] - 14s 143ms/step - loss: 0.1640 - mean_squared_error: 0.0454
Epoch 2/30
5/5 [==============================] - 0s 30ms/step - loss: 0.1558 - mean_squared_error: 0.0480
Epoch 3/30
5/5 [==============================] - 0s 30ms/step - loss: 0.1521 - mean_squared_error: 0.0449
Epoch 4/30
5/5 [==============================] - 0s 31ms/step - loss: 0.1479 - mean_squared_error: 0.0425
Epoch 5/30
5/5 [==============================] - 0s 30ms/step - loss: 0.1425 - mean_squared_error: 0.0403
Epoch 6/30
5/5 [==============================] - 0s 30ms/step - loss: 0.1363 - mean_squared_error: 0.0349
Epoch 7/30
5/5 [==============================] - 0s 31ms/step - loss: 0.1267 - mean_squared_error: 0.0301
Epoch 8/30
5/5 [==============================] - 0s 31ms/step - loss: 0.1195 - mean_squared_error: 0.0258
Epoch 9/30
5/5 [==============================] - 0s 29ms/step - loss: 0.1094 - mean_squared_error: 0.0211
Epoch 10/30
5/5 [==============================] - 0s 37ms/step - loss: 0.1028 - mean_squared_error: 0.0185
Epoch 11/30
5/5 [==============================] - 0s 41ms/step - loss: 0.1015 - mean_squared_error: 0.0179
Epoch 12/30
5/5 [==============================] - 0s 39ms/step - loss: 0.0997 - mean_squared_error: 0.0169
Epoch 13/30
5/5 [==============================] - 0s 36ms/step - loss: 0.0909 - mean_squared_error: 0.0143
Epoch 14/30
5/5 [==============================] - 0s 34ms/step - loss: 0.0861 - mean_squared_error: 0.0129
Epoch 15/30
5/5 [==============================] - 0s 34ms/step - loss: 0.0812 - mean_squared_error: 0.0117
Epoch 16/30
5/5 [==============================] - 0s 33ms/step - loss: 0.0767 - mean_squared_error: 0.0106
Epoch 17/30
5/5 [==============================] - 0s 36ms/step - loss: 0.0726 - mean_squared_error: 0.0097
Epoch 18/30
5/5 [==============================] - 0s 35ms/step - loss: 0.0738 - mean_squared_error: 0.0098
Epoch 19/30
5/5 [==============================] - 0s 33ms/step - loss: 0.0707 - mean_squared_error: 0.0093
Epoch 20/30
5/5 [==============================] - 0s 35ms/step - loss: 0.0738 - mean_squared_error: 0.0097
Epoch 21/30
5/5 [==============================] - 0s 34ms/step - loss: 0.0726 - mean_squared_error: 0.0095
Epoch 22/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0701 - mean_squared_error: 0.0090
Epoch 23/30
5/5 [==============================] - 0s 32ms/step - loss: 0.0686 - mean_squared_error: 0.0087
Epoch 24/30
5/5 [==============================] - 0s 40ms/step - loss: 0.0671 - mean_squared_error: 0.0083
Epoch 25/30
5/5 [==============================] - 0s 34ms/step - loss: 0.0673 - mean_squared_error: 0.0083
Epoch 26/30
5/5 [==============================] - 0s 35ms/step - loss: 0.0679 - mean_squared_error: 0.0084
Epoch 27/30
5/5 [==============================] - 0s 34ms/step - loss: 0.0661 - mean_squared_error: 0.0081
Epoch 28/30
5/5 [==============================] - 0s 35ms/step - loss: 0.0654 - mean_squared_error: 0.0079
Epoch 29/30
5/5 [==============================] - 0s 36ms/step - loss: 0.0682 - mean_squared_error: 0.0083
Epoch 30/30
5/5 [==============================] - 0s 33ms/step - loss: 0.0672 - mean_squared_error: 0.0083
Out[ ]:
<keras.src.callbacks.History at 0x7b0d28239c30>

Testing

In [ ]:
test_x = test_lr[[3]]
test_sr = model.predict(test_x)

plt.figure(figsize = (8, 6))
plt.subplot(1,2,1)
plt.imshow(test_x[0][:,:,0], 'gray')
plt.title('Low-resolution image')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(test_sr[0][:,:,0], 'gray')
plt.title('Super-resolved image')
plt.axis('off')
plt.show()
1/1 [==============================] - 1s 734ms/step

4. Image Deblurring

Blurred and Deblurred Images

Download data from here

In [ ]:
train_blur = np.load('/content/drive/MyDrive/DL_Colab/DL_data/deblurring_train_blur.npy')
train_deblur = np.load('/content/drive/MyDrive/DL_Colab/DL_data/deblurring_train_deblur.npy')
test_blur = np.load('/content/drive/MyDrive/DL_Colab/DL_data/deblurring_test_blur.npy')

n_train = train_blur.shape[0]
n_test = test_blur.shape[0]

print ("The number of training blur images   : {}, shape : {}".format(n_train, train_blur.shape))
print ("The number of training deblur images : {}, shape : {}".format(n_train, train_deblur.shape))
print ("The number of testing blur images    : {}, shape : {}".format(n_test, test_blur.shape))
The number of training blur images   : 79, shape : (79, 224, 224, 1)
The number of training deblur images : 79, shape : (79, 224, 224, 1)
The number of testing blur images    : 20, shape : (20, 224, 224, 1)
In [ ]:
idx = np.random.randint(n_train)

plt.figure(figsize = (8, 6))
plt.subplot(1,2,1)
plt.imshow(train_blur[idx][:,:,0], 'gray')
plt.title('Blurred image')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(train_deblur[idx][:,:,0], 'gray')
plt.title('Deblurred image')
plt.axis('off')
plt.show()

Build a FCN Model



In [ ]:
inputs = tf.keras.Input(shape = (224, 224, 1))

# 3x3 convolutional layer
x = tf.keras.layers.Conv2D(filters = 16,
                           kernel_size = (3,3),
                           padding = 'SAME',
                           activation = 'relu')(inputs)

# first residual block
x_skip = x
x = tf.keras.layers.Conv2D(filters = 16,
                           kernel_size = (3,3),
                           padding = 'SAME',
                           activation = 'relu')(x)

x = tf.keras.layers.Conv2D(filters = 16,
                           kernel_size = (3,3),
                           padding = 'SAME',
                           activation = 'relu')(x)

x = tf.keras.layers.Add()([x_skip, x])

# second residual block
x_skip = x
x = tf.keras.layers.Conv2D(filters = 16,
                           kernel_size = (3,3),
                           padding = 'SAME',
                           activation = 'relu')(x)

x = tf.keras.layers.Conv2D(filters = 16,
                           kernel_size = (3,3),
                           padding = 'SAME',
                           activation = 'relu')(x)

x = tf.keras.layers.Add()([x_skip, x])

# third residual block
x_skip = x
x = tf.keras.layers.Conv2D(filters = 16,
                           kernel_size = (3,3),
                           padding = 'SAME',
                           activation = 'relu')(x)

x = tf.keras.layers.Conv2D(filters = 16,
                           kernel_size = (3,3),
                           padding = 'SAME',
                           activation = 'relu')(x)

x = tf.keras.layers.Add()([x_skip, x])

# 3x3 convolutional layer
outputs = tf.keras.layers.Conv2D(filters = 1,
                                 kernel_size = (3,3),
                                 padding = 'SAME',
                                 activation = 'sigmoid')(x)

model = tf.keras.Model(inputs, outputs)

Training

In [ ]:
model.compile(optimizer = 'adam',
              loss ='mean_absolute_error',
              metrics = ['mean_squared_error'])
In [ ]:
model.fit(train_blur, train_deblur, batch_size = 16, epochs = 30)
Epoch 1/30
5/5 [==============================] - 4s 198ms/step - loss: 0.1618 - mean_squared_error: 0.0480
Epoch 2/30
5/5 [==============================] - 0s 71ms/step - loss: 0.1571 - mean_squared_error: 0.0486
Epoch 3/30
5/5 [==============================] - 0s 72ms/step - loss: 0.1491 - mean_squared_error: 0.0425
Epoch 4/30
5/5 [==============================] - 0s 72ms/step - loss: 0.1393 - mean_squared_error: 0.0377
Epoch 5/30
5/5 [==============================] - 0s 74ms/step - loss: 0.1262 - mean_squared_error: 0.0301
Epoch 6/30
5/5 [==============================] - 0s 81ms/step - loss: 0.1049 - mean_squared_error: 0.0215
Epoch 7/30
5/5 [==============================] - 0s 79ms/step - loss: 0.0897 - mean_squared_error: 0.0156
Epoch 8/30
5/5 [==============================] - 0s 88ms/step - loss: 0.0801 - mean_squared_error: 0.0120
Epoch 9/30
5/5 [==============================] - 0s 82ms/step - loss: 0.0749 - mean_squared_error: 0.0105
Epoch 10/30
5/5 [==============================] - 0s 80ms/step - loss: 0.0697 - mean_squared_error: 0.0095
Epoch 11/30
5/5 [==============================] - 0s 89ms/step - loss: 0.0668 - mean_squared_error: 0.0086
Epoch 12/30
5/5 [==============================] - 0s 90ms/step - loss: 0.0636 - mean_squared_error: 0.0077
Epoch 13/30
5/5 [==============================] - 0s 81ms/step - loss: 0.0606 - mean_squared_error: 0.0071
Epoch 14/30
5/5 [==============================] - 0s 77ms/step - loss: 0.0587 - mean_squared_error: 0.0066
Epoch 15/30
5/5 [==============================] - 0s 76ms/step - loss: 0.0583 - mean_squared_error: 0.0064
Epoch 16/30
5/5 [==============================] - 0s 74ms/step - loss: 0.0572 - mean_squared_error: 0.0061
Epoch 17/30
5/5 [==============================] - 0s 74ms/step - loss: 0.0560 - mean_squared_error: 0.0058
Epoch 18/30
5/5 [==============================] - 0s 74ms/step - loss: 0.0517 - mean_squared_error: 0.0052
Epoch 19/30
5/5 [==============================] - 0s 74ms/step - loss: 0.0510 - mean_squared_error: 0.0050
Epoch 20/30
5/5 [==============================] - 0s 74ms/step - loss: 0.0499 - mean_squared_error: 0.0048
Epoch 21/30
5/5 [==============================] - 0s 74ms/step - loss: 0.0489 - mean_squared_error: 0.0046
Epoch 22/30
5/5 [==============================] - 0s 73ms/step - loss: 0.0514 - mean_squared_error: 0.0048
Epoch 23/30
5/5 [==============================] - 0s 73ms/step - loss: 0.0508 - mean_squared_error: 0.0047
Epoch 24/30
5/5 [==============================] - 0s 74ms/step - loss: 0.0496 - mean_squared_error: 0.0045
Epoch 25/30
5/5 [==============================] - 0s 73ms/step - loss: 0.0483 - mean_squared_error: 0.0043
Epoch 26/30
5/5 [==============================] - 0s 74ms/step - loss: 0.0460 - mean_squared_error: 0.0040
Epoch 27/30
5/5 [==============================] - 0s 75ms/step - loss: 0.0452 - mean_squared_error: 0.0039
Epoch 28/30
5/5 [==============================] - 0s 74ms/step - loss: 0.0459 - mean_squared_error: 0.0040
Epoch 29/30
5/5 [==============================] - 0s 74ms/step - loss: 0.0442 - mean_squared_error: 0.0038
Epoch 30/30
5/5 [==============================] - 0s 74ms/step - loss: 0.0431 - mean_squared_error: 0.0036
Out[ ]:
<keras.src.callbacks.History at 0x7b0cfffc50c0>

Testing

In [ ]:
test_x = test_blur[[1]]
test_deblur = model.predict(test_x)

plt.figure(figsize = (8, 6))
plt.subplot(1,2,1)
plt.imshow(test_x[0][:,:,0], 'gray')
plt.title('Blurred image')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(test_deblur[0][:,:,0], 'gray')
plt.title('Deblurred image')
plt.axis('off')
plt.show()
1/1 [==============================] - 0s 124ms/step
In [ ]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')