Fully Convolutional Networks (FCN)
Table of Contents
from IPython.display import YouTubeVideo
YouTubeVideo('sKDv7yp3Jdk?si=zflwWs4XaLljHfNC', width = "560", height = "315")
tf.keras.layers.Conv2D(filters, kernel_size, strides, padding, activation, kernel_regularizer, input_shape)
filters = 32
kernel_size = (3,3)
strides = (1,1)
padding = 'SAME'
activeation='relu'
kernel_regularizer=tf.keras.regularizers.l2(0.04)
input_shape = tensor of shape([input_h, input_w, input_ch])
filter size
kernel_size
stride
padding
'SAME'
: enable zero padding'VALID'
: disable zero paddingactivation
kernel_regularizer
input and output channels
Examples
input = [None, 4, 4, 1]
filter size = [3, 3, 1, 1]
strides = [1, 1, 1, 1]
padding = 'VALID'
input = [None, 5, 5, 1]
filter size = [3, 3, 1, 1]
strides = [1, 1, 1, 1]
padding = 'SAME'
The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution. For instance, one might use such a transformation as the decoding layer of a convolutional autoencoder or to project feature maps to a higher-dimensional space.
Some sources use the name deconvolution, which is inappropriate because it’s not a deconvolution. To make things worse deconvolutions do exists, but they’re not common in the field of deep learning.
An actual deconvolution reverts the process of a convolution.
Imagine inputting an image into a single convolutional layer. Now take the output, throw it into a black box and out comes your original image again. This black box does a deconvolution. It is the mathematical inverse of what a convolutional layer does.
A transposed convolution is somewhat similar because it produces the same spatial resolution a hypothetical deconvolutional layer would. However, the actual mathematical operation that’s being performed on the values is different.
A transposed convolutional layer carries out a regular convolution but reverts its spatial transformation.
tf.keras.layers.Conv2DTranspose(filters, kernel_size, strides, padding = 'SAME', activation)
filter = number of channels/ 64
kernel_size = tensor of shape (3,3)
strides = stride of the sliding window for each dimension of the input tensor
padding = 'SAME'
activation = activation functions('softmax', 'relu' ...)
'SAME'
: enable zero padding'VALID'
: disable zero paddingAn image of 5x5 is fed into a convolutional layer. The stride is set to 2, the padding is deactivated and the kernel is 3x3. This results in a 2x2 image.
If we wanted to reverse this process, we’d need the inverse mathematical operation so that 9 values are generated from each pixel we input. Afterward, we traverse the output image with a stride of 2. This would be a deconvolution.
A transposed convolution does not do that. The only thing in common is it guarantees that the output will be a 5x5 image as well, while still performing a normal convolution operation. To achieve this, we need to perform some fancy padding on the input.
It merely reconstructs the spatial resolution from before and performs a convolution. This may not be the mathematical inverse, but for Encoder-Decoder architectures, it’s still very helpful. This way we can combine the upscaling of an image with a convolution, instead of doing two separate processes.
Strides and padding for transposed convolution (optional)
A transposed 2-D convolution layer upsamples feature maps.
This layer is sometimes incorrectly known as a "deconvolution" or "deconv" layer. This layer is the transpose of convolution and does not perform deconvolution.
%%html
<iframe src="https://www.youtube.com/embed/nTt_ajul8NY?start=725"
width="560" height="315" frameborder="0" allowfullscreen></iframe>
Import Library
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline
Load MNIST Data
# Load Data
mnist = tf.keras.datasets.mnist
(train_imgs, train_labels), (test_imgs, test_labels) = mnist.load_data()
train_imgs, test_imgs = train_imgs/255.0, test_imgs/255.0
# Use Only 1,5,6 Digits to Visualize
train_x = train_imgs[np.hstack([np.where(train_labels == 1),
np.where(train_labels == 5),
np.where(train_labels == 6)])][0]
train_y = train_labels[np.hstack([np.where(train_labels == 1),
np.where(train_labels == 5),
np.where(train_labels == 6)])][0]
test_x = test_imgs[np.hstack([np.where(test_labels == 1),
np.where(test_labels == 5),
np.where(test_labels == 6)])][0]
test_y = test_labels[np.hstack([np.where(test_labels == 1),
np.where(test_labels == 5),
np.where(test_labels == 6)])][0]
train_x = train_x.reshape(-1,28,28,1)
test_x = test_x.reshape(-1,28,28,1)
The following architecture has been implemented.
Build a Model
encoder = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(filters = 32,
kernel_size = (3,3),
strides = (2,2),
activation = 'relu',
padding = 'SAME',
input_shape = (28, 28, 1)),
tf.keras.layers.Conv2D(filters = 64,
kernel_size = (3,3),
strides = (2,2),
activation = 'relu',
padding = 'SAME',
input_shape = (14, 14, 32)),
tf.keras.layers.Conv2D(filters = 2,
kernel_size = (7,7),
padding = 'VALID',
input_shape = (7,7,64))
])
decoder = tf.keras.models.Sequential([
tf.keras.layers.Conv2DTranspose(filters = 64,
kernel_size = (7,7),
strides = (1,1),
activation = 'relu',
padding = 'VALID',
input_shape = (1, 1, 2)),
tf.keras.layers.Conv2DTranspose(filters = 32,
kernel_size = (3,3),
strides = (2,2),
activation = 'relu',
padding = 'SAME',
input_shape = (7, 7, 64)),
tf.keras.layers.Conv2DTranspose(filters = 1,
kernel_size = (7,7),
strides = (2,2),
padding = 'SAME',
input_shape = (14,14,32))
])
latent = encoder.output
result = decoder(latent)
model = tf.keras.Model(inputs = encoder.input, outputs = result)
Define Loss and Optimizer
model.compile(optimizer = 'adam',
loss = 'mean_squared_error')
Define Optimization Configuration and Then Optimize
model.fit(train_x, train_x, epochs = 10)
test_img = test_x[[6]]
x_reconst = model.predict(test_img)
plt.figure(figsize = (6, 4))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28,28), 'gray')
plt.title('Input image', fontsize = 15)
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(x_reconst.reshape(28,28), 'gray')
plt.title('Reconstructed image', fontsize = 15)
plt.axis('off')
plt.show()
idx = np.random.choice(test_y.shape[0], 500)
rnd_x, rnd_y = test_x[idx], test_y[idx]
rnd_latent = encoder.predict(rnd_x)
rnd_latent = rnd_latent.reshape(-1,2)
plt.figure(figsize = (6, 6))
plt.scatter(rnd_latent[rnd_y == 1, 0], rnd_latent[rnd_y == 1, 1], label = '1')
plt.scatter(rnd_latent[rnd_y == 5, 0], rnd_latent[rnd_y == 5, 1], label = '5')
plt.scatter(rnd_latent[rnd_y == 6, 0], rnd_latent[rnd_y == 6, 1], label = '6')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(fontsize = 15)
plt.axis('equal')
plt.show()
new_latent = np.array([[-8, 0]]).reshape(-1,1,1,2)
fake_img = decoder.predict(new_latent)
plt.figure(figsize = (9, 4))
plt.subplot(1,2,1)
plt.scatter(rnd_latent[rnd_y == 1, 0], rnd_latent[rnd_y == 1, 1], label = '1')
plt.scatter(rnd_latent[rnd_y == 5, 0], rnd_latent[rnd_y == 5, 1], label = '5')
plt.scatter(rnd_latent[rnd_y == 6, 0], rnd_latent[rnd_y == 6, 1], label = '6')
plt.scatter(new_latent[:,:,:,0], new_latent[:,:,:,1], c = 'k', marker = 'o', s = 200, label = 'new data')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(loc = 2, fontsize = 12)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(fake_img.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()
from IPython.display import YouTubeVideo
YouTubeVideo('sKDv7yp3Jdk?si=msi2nCF34udI3bUj&start=1230', width = "560", height = "315")
Segmentation task is different from classification task because it requires predicting a class for each pixel of the input image, instead of only 1 class for the whole input.
Classification needs to understand what is in the input (namely, the context).
However, in order to predict what is in the input for each pixel, segmentation needs to recover not only what is in the input, but also where.
Segment images into regions with different semantic categories. These semantic regions label and predict objects at the pixel level
FCN is built only from locally connected layers, such as convolution, pooling and upsampling.
Note that no dense layer is used in this kind of architecture.
Network can work regardless of the original image size, without requiring any fixed number of units at any stage.
To obtain a segmentation map (output), segmentation networks usually have 2 parts
The downsampling path is used to extract and interpret the context (what), while the upsampling path is used to enable precise localization (where).
Furthermore, to fully recover the fine-grained spatial information lost in the pooling or downsampling layers, we often use skip connections.
Given a position on the spatial dimension, the output of the channel dimension will be a category prediction of the pixel corresponding to the location.
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from google.colab import drive
drive.mount('/content/drive')
seg_train_imgs = np.load('/content/drive/MyDrive/DL/DL_data/seg_train_imgs.npy')
seg_train_labels = np.load('/content/drive/MyDrive/DL/DL_data/seg_train_labels.npy')
seg_test_imgs = np.load('/content/drive/MyDrive/DL/DL_data/seg_test_imgs.npy')
n_train = seg_train_imgs.shape[0]
n_test = seg_train_imgs.shape[0]
print ("The number of training images : {}, shape : {}".format(n_train, seg_train_imgs.shape))
print ("The number of segmented images : {}, shape : {}".format(n_train, seg_train_labels.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, seg_test_imgs.shape))
## binary segmentation and one-hot encoding in this case
idx = np.random.randint(n_train)
plt.figure(figsize = (10, 4))
plt.subplot(1,3,1)
plt.imshow(seg_train_imgs[idx])
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(seg_train_labels[idx][:,:,0])
plt.axis('off')
plt.subplot(1,3,3)
plt.imshow(seg_train_labels[idx][:,:,1])
plt.axis('off')
plt.show()
Utilize VGG16 Model for Encoder
model_type = tf.keras.applications.vgg16
base_model = model_type.VGG16()
base_model.trainable = False
base_model.summary()
Build a FCN Model
map5 = base_model.layers[-5].output
# sixth convolution layer
conv6 = tf.keras.layers.Conv2D(filters = 4096,
kernel_size = (7,7),
padding = 'SAME',
activation = 'relu')(map5)
# 1x1 convolution layers
fcn4 = tf.keras.layers.Conv2D(filters = 4096,
kernel_size = (1,1),
padding = 'SAME',
activation = 'relu')(conv6)
fcn3 = tf.keras.layers.Conv2D(filters = 2,
kernel_size = (1,1),
padding = 'SAME',
activation = 'relu')(fcn4)
# Upsampling layers
fcn2 = tf.keras.layers.Conv2DTranspose(filters = 512,
kernel_size = (4,4),
strides = (2,2),
padding = 'SAME')(fcn3)
fcn1 = tf.keras.layers.Conv2DTranspose(filters = 256,
kernel_size = (4,4),
strides = (2,2),
padding = 'SAME')(fcn2 + base_model.layers[14].output)
output = tf.keras.layers.Conv2DTranspose(filters = 2,
kernel_size = (16,16),
strides = (8,8),
padding = 'SAME',
activation = 'softmax')(fcn1 + base_model.layers[10].output)
model = tf.keras.Model(inputs = base_model.inputs, outputs = output)
model.summary()
Training
model.compile(optimizer = 'adam',
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
model.fit(seg_train_imgs, seg_train_labels, batch_size = 5, epochs = 5)
Testing
test_img = seg_test_imgs[[1]]
test_segmented = model.predict(test_img)
seg_mask = (test_segmented[:,:,:,1] > 0.5).reshape(224, 224, 1).astype(float)
plt.figure(figsize = (8,8))
plt.subplot(2,2,1)
plt.imshow(test_img[0])
plt.axis('off')
plt.subplot(2,2,2)
plt.imshow(seg_mask, cmap = 'Blues')
plt.axis('off')
plt.subplot(2,2,3)
plt.imshow(test_img[0])
plt.imshow(seg_mask, cmap = 'Blues', alpha = 0.5)
plt.axis('off')
plt.show()
from IPython.display import YouTubeVideo
YouTubeVideo('7h91Q94E7aw?si=_jEnWdl_Hw3hBx90&start=511', width = "560", height = "315")
Image restoration tries to recover original image from degraded one with prior knowledge of degradation process.
The sources of corruption in digital images arise during image acquisition (digitization) and transmission.
The reconstruction is the inverse of the acquisition.
Inverse problems involve modeling of degradation and applying the inverse process in order to recover the original image from inadequate observations.
The observations contain incomplete information about the target parameter or data due to physical limitations of the measurement devices.
Consequently, solutions to inverse problems are non-unique.
HR and LR Images
Download data from here
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from google.colab import drive
drive.mount('/content/drive')
train_lr = np.load('/content/drive/MyDrive/DL/DL_data/SR_train_lr.npy')
train_hr = np.load('/content/drive/MyDrive/DL/DL_data/SR_train_hr.npy')
test_lr = np.load('/content/drive/MyDrive/DL/DL_data/SR_test_lr.npy')
n_train = train_lr.shape[0]
n_test = test_lr.shape[0]
print ("The number of training LR images : {}, shape : {}".format(n_train, train_lr.shape))
print ("The number of training HR images : {}, shape : {}".format(n_train, train_hr.shape))
print ("The number of testing LR images : {}, shape : {}".format(n_test, test_lr.shape))
idx = np.random.randint(n_train)
plt.figure(figsize = (8, 6))
plt.subplot(1,2,1)
plt.imshow(train_lr[idx][:,:,0], 'gray')
plt.title('Low-resolution image')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(train_hr[idx][:,:,0], 'gray')
plt.title('High-resolution image')
plt.axis('off')
plt.show()
Build a FCN Model
inputs = tf.keras.Input(shape = (112, 112, 1))
# 3x3 convolutional layer
x = tf.keras.layers.Conv2D(filters = 16,
kernel_size = (3,3),
padding = 'SAME',
activation = 'relu')(inputs)
# first residual block
x_skip = x
x = tf.keras.layers.Conv2D(filters = 16,
kernel_size = (3,3),
padding = 'SAME',
activation = 'relu')(x)
x = tf.keras.layers.Conv2D(filters = 16,
kernel_size = (3,3),
padding = 'SAME',
activation = 'relu')(x)
x = tf.keras.layers.Add()([x_skip, x])
# second residual block
x_skip = x
x = tf.keras.layers.Conv2D(filters = 16,
kernel_size = (3,3),
padding = 'SAME',
activation = 'relu')(x)
x = tf.keras.layers.Conv2D(filters = 16,
kernel_size = (3,3),
padding = 'SAME',
activation = 'relu')(x)
x = tf.keras.layers.Add()([x_skip, x])
# third residual block
x_skip = x
x = tf.keras.layers.Conv2D(filters = 16,
kernel_size = (3,3),
padding = 'SAME',
activation = 'relu')(x)
x = tf.keras.layers.Conv2D(filters = 16,
kernel_size = (3,3),
padding = 'SAME',
activation = 'relu')(x)
x = tf.keras.layers.Add()([x_skip, x])
# upsampling layer
x = tf.keras.layers.Conv2DTranspose(filters = 16,
kernel_size = (4,4),
strides = (2,2),
padding = 'SAME',
activation = 'relu')(x)
# 3x3 convolutional layer
outputs = tf.keras.layers.Conv2D(filters = 1,
kernel_size = (3,3),
padding = 'SAME',
activation = 'sigmoid')(x)
model = tf.keras.Model(inputs, outputs)
Training
model.compile(optimizer = 'adam',
loss = 'mean_absolute_error',
metrics = ['mean_squared_error'])
model.fit(train_lr, train_hr, batch_size = 16, epochs = 30)
Testing
test_x = test_lr[[3]]
test_sr = model.predict(test_x)
plt.figure(figsize = (8, 6))
plt.subplot(1,2,1)
plt.imshow(test_x[0][:,:,0], 'gray')
plt.title('Low-resolution image')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(test_sr[0][:,:,0], 'gray')
plt.title('Super-resolved image')
plt.axis('off')
plt.show()
train_blur = np.load('/content/drive/MyDrive/DL_Colab/DL_data/deblurring_train_blur.npy')
train_deblur = np.load('/content/drive/MyDrive/DL_Colab/DL_data/deblurring_train_deblur.npy')
test_blur = np.load('/content/drive/MyDrive/DL_Colab/DL_data/deblurring_test_blur.npy')
n_train = train_blur.shape[0]
n_test = test_blur.shape[0]
print ("The number of training blur images : {}, shape : {}".format(n_train, train_blur.shape))
print ("The number of training deblur images : {}, shape : {}".format(n_train, train_deblur.shape))
print ("The number of testing blur images : {}, shape : {}".format(n_test, test_blur.shape))
idx = np.random.randint(n_train)
plt.figure(figsize = (8, 6))
plt.subplot(1,2,1)
plt.imshow(train_blur[idx][:,:,0], 'gray')
plt.title('Blurred image')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(train_deblur[idx][:,:,0], 'gray')
plt.title('Deblurred image')
plt.axis('off')
plt.show()
Build a FCN Model
inputs = tf.keras.Input(shape = (224, 224, 1))
# 3x3 convolutional layer
x = tf.keras.layers.Conv2D(filters = 16,
kernel_size = (3,3),
padding = 'SAME',
activation = 'relu')(inputs)
# first residual block
x_skip = x
x = tf.keras.layers.Conv2D(filters = 16,
kernel_size = (3,3),
padding = 'SAME',
activation = 'relu')(x)
x = tf.keras.layers.Conv2D(filters = 16,
kernel_size = (3,3),
padding = 'SAME',
activation = 'relu')(x)
x = tf.keras.layers.Add()([x_skip, x])
# second residual block
x_skip = x
x = tf.keras.layers.Conv2D(filters = 16,
kernel_size = (3,3),
padding = 'SAME',
activation = 'relu')(x)
x = tf.keras.layers.Conv2D(filters = 16,
kernel_size = (3,3),
padding = 'SAME',
activation = 'relu')(x)
x = tf.keras.layers.Add()([x_skip, x])
# third residual block
x_skip = x
x = tf.keras.layers.Conv2D(filters = 16,
kernel_size = (3,3),
padding = 'SAME',
activation = 'relu')(x)
x = tf.keras.layers.Conv2D(filters = 16,
kernel_size = (3,3),
padding = 'SAME',
activation = 'relu')(x)
x = tf.keras.layers.Add()([x_skip, x])
# 3x3 convolutional layer
outputs = tf.keras.layers.Conv2D(filters = 1,
kernel_size = (3,3),
padding = 'SAME',
activation = 'sigmoid')(x)
model = tf.keras.Model(inputs, outputs)
Training
model.compile(optimizer = 'adam',
loss ='mean_absolute_error',
metrics = ['mean_squared_error'])
model.fit(train_blur, train_deblur, batch_size = 16, epochs = 30)
Testing
test_x = test_blur[[1]]
test_deblur = model.predict(test_x)
plt.figure(figsize = (8, 6))
plt.subplot(1,2,1)
plt.imshow(test_x[0][:,:,0], 'gray')
plt.title('Blurred image')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(test_deblur[0][:,:,0], 'gray')
plt.title('Deblurred image')
plt.axis('off')
plt.show()
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')