Segmentation
Table of Contents
Source
tf.keras.layers.Conv2D(filters, kernel_size, strides, padding, activation, kernel_regularizer, input_shape)
filters = 32
kernel_size = (3,3)
strides = (1,1)
padding = 'SAME'
activeation='relu'
kernel_regularizer=tf.keras.regularizers.l2(0.04)
input_shape = tensor of shape([input_h, input_w, input_ch])
kernel_size
stride
padding
'SAME'
: enable zero padding'VALID'
: disable zero paddingkernel_regularizer
input and output channels
Examples
input = [None, 4, 4, 1]
filter size = [3, 3, 1, 1]
strides = [1, 1, 1, 1]
padding = 'VALID'
input = [None, 5, 5, 1]
filter size = [3, 3, 1, 1]
strides = [1, 1, 1, 1]
padding = 'SAME'
Some sources use the name deconvolution, which is inappropriate because itās not a deconvolution. To make things worse deconvolutions do exists, but theyāre not common in the field of deep learning.
An actual deconvolution reverts the process of a convolution.
Imagine inputting an image into a single convolutional layer. Now take the output, throw it into a black box and out comes your original image again. This black box does a deconvolution. It is the mathematical inverse of what a convolutional layer does.
A transposed convolution is somewhat similar because it produces the same spatial resolution a hypothetical deconvolutional layer would. However, the actual mathematical operation thatās being performed on the values is different.
A transposed convolutional layer carries out a regular convolution but reverts its spatial transformation.
tf.keras.layers.Conv2DTranspose(filters, kernel_size, strides, padding = 'SAME', activation)
filter = number of channels/ 64
kernel_size = tensor of shape (3,3)
strides = stride of the sliding window for each dimension of the input tensor
padding = 'SAME'
activation = activation functions('softmax', 'relu' ...)
'SAME'
: enable zero padding'VALID'
: disable zero paddingAn image of 5x5 is fed into a convolutional layer. The stride is set to 2, the padding is deactivated and the kernel is 3x3. This results in a 2x2 image.
If we wanted to reverse this process, weād need the inverse mathematical operation so that 9 values are generated from each pixel we input. Afterward, we traverse the output image with a stride of 2. This would be a deconvolution.
A transposed convolution does not do that. The only thing in common is it guarantees that the output will be a 5x5 image as well, while still performing a normal convolution operation. To achieve this, we need to perform some fancy padding on the input.
It merely reconstructs the spatial resolution from before and performs a convolution. This may not be the mathematical inverse, but for Encoder-Decoder architectures, itās still very helpful. This way we can combine the upscaling of an image with a convolution, instead of doing two separate processes.
Strides and padding for transposed convolution (optional)
%%html
<center><iframe src="https://www.youtube.com/embed/nTt_ajul8NY?start=725"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline
# Load Data
mnist = tf.keras.datasets.mnist
(train_imgs, train_labels), (test_imgs, test_labels) = mnist.load_data()
train_imgs, test_imgs = train_imgs/255.0, test_imgs/255.0
# Use Only 1,5,6 Digits to Visualize
train_x = train_imgs[np.hstack([np.where(train_labels == 1),
np.where(train_labels == 5),
np.where(train_labels == 6)])][0]
train_y = train_labels[np.hstack([np.where(train_labels == 1),
np.where(train_labels == 5),
np.where(train_labels == 6)])][0]
test_x = test_imgs[np.hstack([np.where(test_labels == 1),
np.where(test_labels == 5),
np.where(test_labels == 6)])][0]
test_y = test_labels[np.hstack([np.where(test_labels == 1),
np.where(test_labels == 5),
np.where(test_labels == 6)])][0]
train_x = train_x.reshape(-1,28,28,1)
test_x = test_x.reshape(-1,28,28,1)
The following structure is implemented.
encoder = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(filters = 32,
kernel_size = (3,3),
strides = (2,2),
activation = 'relu',
padding = 'SAME',
input_shape = (28, 28, 1)),
tf.keras.layers.Conv2D(filters = 64,
kernel_size = (3,3),
strides = (2,2),
activation = 'relu',
padding = 'SAME',
input_shape = (14, 14, 32)),
tf.keras.layers.Conv2D(filters = 2,
kernel_size = (7,7),
padding = 'VALID',
input_shape = (7,7,64))
])
decoder = tf.keras.models.Sequential([
tf.keras.layers.Conv2DTranspose(filters = 64,
kernel_size = (7,7),
strides = (1,1),
activation = 'relu',
padding = 'VALID',
input_shape = (1, 1, 2)),
tf.keras.layers.Conv2DTranspose(filters = 32,
kernel_size = (3,3),
strides = (2,2),
activation = 'relu',
padding = 'SAME',
input_shape = (7, 7, 64)),
tf.keras.layers.Conv2DTranspose(filters = 1,
kernel_size = (7,7),
strides = (2,2),
padding = 'SAME',
input_shape = (14,14,32))
])
latent = encoder.output
result = decoder(latent)
model = tf.keras.Model(inputs = encoder.input, outputs = result)
model.compile(optimizer = 'adam',
loss = 'mean_squared_error')
model.fit(train_x, train_x, epochs = 10)
test_img = test_x[[6]]
x_reconst = model.predict(test_img)
plt.figure(figsize = (10,8))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28,28), 'gray')
plt.title('Input image', fontsize = 15)
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(x_reconst.reshape(28,28), 'gray')
plt.title('Reconstructed image', fontsize = 15)
plt.axis('off')
plt.show()
idx = np.random.choice(test_y.shape[0], 500)
rnd_x, rnd_y = test_x[idx], test_y[idx]
rnd_latent = encoder.predict(rnd_x)
rnd_latent = rnd_latent.reshape(-1,2)
plt.figure(figsize = (10,10))
plt.scatter(rnd_latent[rnd_y == 1, 0], rnd_latent[rnd_y == 1, 1], label = '1')
plt.scatter(rnd_latent[rnd_y == 5, 0], rnd_latent[rnd_y == 5, 1], label = '5')
plt.scatter(rnd_latent[rnd_y == 6, 0], rnd_latent[rnd_y == 6, 1], label = '6')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(fontsize = 15)
plt.axis('equal')
plt.show()
new_latent = np.array([[2, -4]]).reshape(-1,1,1,2)
fake_img = decoder.predict(new_latent)
plt.figure(figsize = (16,7))
plt.subplot(1,2,1)
plt.scatter(rnd_latent[rnd_y == 1, 0], rnd_latent[rnd_y == 1, 1], label = '1')
plt.scatter(rnd_latent[rnd_y == 5, 0], rnd_latent[rnd_y == 5, 1], label = '5')
plt.scatter(rnd_latent[rnd_y == 6, 0], rnd_latent[rnd_y == 6, 1], label = '6')
plt.scatter(new_latent[:,:,:,0], new_latent[:,:,:,1], c = 'k', marker = 'o', s = 200, label = 'new data')
plt.title('Latent Space', fontsize = 15)
plt.xlabel('Z1', fontsize = 15)
plt.ylabel('Z2', fontsize = 15)
plt.legend(loc = 2, fontsize = 12)
plt.axis('equal')
plt.subplot(1,2,2)
plt.imshow(fake_img.reshape(28,28), 'gray')
plt.title('Generated Fake Image', fontsize = 15)
plt.xticks([])
plt.yticks([])
plt.show()
To obtain a segmentation map (output), segmentation networks usually have 2 parts
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
train_imgs = np.load('./data_files/images_training.npy')
train_seg = np.load('./data_files/seg_training.npy')
test_imgs = np.load('./data_files/images_testing.npy')
n_train = train_imgs.shape[0]
n_test = test_imgs.shape[0]
print ("The number of training images : {}, shape : {}".format(n_train, train_imgs.shape))
print ("The number of segmented images : {}, shape : {}".format(n_train, train_seg.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_imgs.shape))
idx = np.random.randint(n_train)
plt.figure(figsize = (15,10))
plt.subplot(1,3,1)
plt.imshow(train_imgs[idx])
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(train_seg[idx][:,:,0])
plt.axis('off')
plt.subplot(1,3,3)
plt.imshow(train_seg[idx][:,:,1])
plt.axis('off')
plt.show()
model_type = tf.keras.applications.vgg16
base_model = model_type.VGG16()
base_model.trainable = False
base_model.summary()
map5 = base_model.layers[-5].output
# sixth convolution layer
conv6 = tf.keras.layers.Conv2D(filters = 4096,
kernel_size = (7,7),
padding = 'SAME',
activation = 'relu')(map5)
# 1x1 convolution layers
fcn4 = tf.keras.layers.Conv2D(filters = 4096,
kernel_size = (1,1),
padding = 'SAME',
activation = 'relu')(conv6)
fcn3 = tf.keras.layers.Conv2D(filters = 2,
kernel_size = (1,1),
padding = 'SAME',
activation = 'relu')(fcn4)
# Upsampling layers
fcn2 = tf.keras.layers.Conv2DTranspose(filters = 512,
kernel_size = (4,4),
strides = (2,2),
padding = 'SAME')(fcn3)
fcn1 = tf.keras.layers.Conv2DTranspose(filters = 256,
kernel_size = (4,4),
strides = (2,2),
padding = 'SAME')(fcn2 + base_model.layers[14].output)
output = tf.keras.layers.Conv2DTranspose(filters = 2,
kernel_size = (16,16),
strides = (8,8),
padding = 'SAME',
activation = 'softmax')(fcn1 + base_model.layers[10].output)
model = tf.keras.Model(inputs = base_model.inputs, outputs = output)
model.summary()
model.compile(optimizer = 'adam',
loss = 'categorical_crossentropy',
metrics = 'accuracy')
model.fit(train_imgs, train_seg, batch_size = 5, epochs = 5)
test_x = test_imgs[[1]]
test_seg = model.predict(test_x)
seg_mask = (test_seg[:,:,:,1] > 0.5).reshape(224, 224).astype(float)
plt.figure(figsize = (14,14))
plt.subplot(2,2,1)
plt.imshow(test_x[0])
plt.axis('off')
plt.subplot(2,2,2)
plt.imshow(seg_mask, cmap = 'Blues')
plt.axis('off')
plt.subplot(2,2,3)
plt.imshow(test_x[0])
plt.imshow(seg_mask, cmap = 'Blues', alpha = 0.5)
plt.axis('off')
plt.show()
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')