Convolutional Neural Networks (CNN)
Table of Contents
1.2. Convolution on Image (= Convolution in 2D)¶
Filter (or Kernel)
Modify or enhance an image by filtering
Filter images to emphasize certain features or remove other features
Filtering includes smoothing, sharpening and edge enhancement
Discrete convolution can be viewed as element-wise multiplication by a matrix
How to find the right Kernels
We learn many different kernels that make specific effect on images
Let’s apply an opposite approach
We are not designing the kernel, but are learning the kernel from data
Can learn feature extractor from data using a deep learning framework
ANN structure for object detecion in image
- does not seem the best
- did not make use of the fact that we are dealing with images
- Spatial organization of the input is destroyed by flattening
- Locality: objects tend to have a local spatial support
- fully and convolutionally connected layer → locally and convolutionally connected layer
- Translation invariance: object appearance is independent of location
- Weight sharing: untis connected to different locations have the same weights
- We are not designing the kernel, but are learning the kernel from data
- i.e., We are learning visual feature extractor from data
2.2. Convolutional Operator¶
Convolution of CNN
Local connectivity
Weight sharing
Typically have sparse interactions
Convolutional Neural Networks
- Simply neural networks that use the convolution in place of general matrix multiplication in at least one of their layers
- Multiple kernels
- Multiple channels
2.3 Stride and Padding¶
Strides: increment step size for the convolution operator
- Reduces the size of the output map
No stride and no padding
- Stride example with kernel size 3×3 and a stride of 2
- Padding: artificially fill borders of image
- Useful to keep spatial dimension constant across filters
- Useful with strides and large receptive fields
- Usually fill with 0s
2.4. Nonlinear Activation Function¶
2.5. Pooling¶
- Compute a maximum value in a sliding window (max pooling)
- Reduce spatial resolution for faster computation
- Achieve invariance to any permutation inside one of the cell
- Pooling size : $2\times2$ for example
2.6. CNN for Classification¶
- CONV and POOL layers output high-level features of input
- Fully connected layer uses these features for classifying input image
- Express output as probability of image belonging to a particular class
3. Lab: CNN with TensorFlow (MNIST)¶
- MNIST example
- To classify handwritten digits
3.1. Training¶
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
mnist = tf.keras.datasets.mnist
(train_x, train_y), (test_x, test_y) = mnist.load_data()
train_x, test_x = train_x/255.0, test_x/255.0
train_x = train_x.reshape((train_x.shape[0], 28, 28, 1))
test_x = test_x.reshape((test_x.shape[0], 28, 28, 1))
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(filters = 32,
kernel_size = (3,3),
activation = 'relu',
padding = 'SAME',
input_shape = (28, 28, 1)),
tf.keras.layers.MaxPool2D((2,2)),
tf.keras.layers.Conv2D(filters = 64,
kernel_size = (3,3),
activation = 'relu',
padding = 'SAME',
input_shape = (14, 14, 32)),
tf.keras.layers.MaxPool2D((2,2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units = 128, activation = 'relu'),
tf.keras.layers.Dense(units = 10, activation = 'softmax')
])
model.compile(optimizer = 'adam',
loss = 'sparse_categorical_crossentropy',
metrics = 'accuracy')
model.fit(train_x, train_y, batch_size = 50, epochs = 3)
3.2. Testing or Evaluating¶
test_loss, test_acc = model.evaluate(test_x, test_y)
test_img = test_x[[1495]]
predict = model.predict(test_img, verbose = 0)
mypred = np.argmax(predict, axis = 1)
plt.figure(figsize = (9, 4))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()
print('Prediction : {}'.format(mypred[0]))
Download NEU steel surface defects images and labels
4.1. Training¶
from google.colab import drive
drive.mount('/content/drive')
# Change file paths if necessary
train_x = np.load('/content/drive/MyDrive/DL_Colab/DL_data/NEU_train_imgs.npy')
train_y = np.load('/content/drive/MyDrive/DL_Colab/DL_data/NEU_train_labels.npy')
test_x = np.load('/content/drive/MyDrive/DL_Colab/DL_data/NEU_test_imgs.npy')
test_y = np.load('/content/drive/MyDrive/DL_Colab/DL_data/NEU_test_labels.npy')
print(train_x.shape)
print(train_y.shape)
print(test_x.shape)
print(test_y.shape)
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(filters = 32,
kernel_size = (3,3),
activation = 'relu',
padding = 'SAME',
input_shape = (200, 200, 1)),
tf.keras.layers.MaxPool2D((2,2)),
tf.keras.layers.Conv2D(filters = 64,
kernel_size = (3,3),
activation = 'relu',
padding = 'SAME',
input_shape = (100, 100, 32)),
tf.keras.layers.MaxPool2D((2,2)),
tf.keras.layers.Conv2D(filters = 128,
kernel_size = (3,3),
activation = 'relu',
padding = 'SAME',
input_shape = (50, 50, 64)),
tf.keras.layers.MaxPool2D((2,2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units = 128, activation = 'relu'),
tf.keras.layers.Dense(units = 6, activation = 'softmax')
])
model.compile(optimizer = 'adam',
loss = 'sparse_categorical_crossentropy',
metrics = 'accuracy')
model.fit(train_x, train_y, batch_size = 50, epochs = 10)
4.2. Testing or Evaluating¶
test_loss, test_acc = model.evaluate(test_x, test_y)
name = ['scratches', 'rolled-in scale', 'pitted surface', 'patches', 'inclusion', 'crazing']
idx = np.random.choice(test_x.shape[0], 1)
test_img = test_x[idx]
test_label = test_y[idx]
predict = model.predict(test_img, verbose = 0)
mypred = np.argmax(predict, axis = 1)
plt.figure(figsize = (9, 4))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(200, 200), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()
print('Prediction : {}'.format(name[mypred[0]]))
print('True Label : {}'.format(name[test_label[0]]))
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')