Convolutional Neural Networks (CNN)

Table of Contents

1. Convolution¶

1.1. 1D Convolution¶

1.2. Convolution on Image (= Convolution in 2D)¶

Filter (or Kernel)

Modify or enhance an image by filtering
Filter images to emphasize certain features or remove other features
Filtering includes smoothing, sharpening and edge enhancement
Discrete convolution can be viewed as element-wise multiplication by a matrix

How to find the right Kernels

We learn many different kernels that make specific effect on images
Let’s apply an opposite approach
We are not designing the kernel, but are learning the kernel from data
Can learn feature extractor from data using a deep learning framework

2. Convolutional Neural Networks (CNN)¶

2.1. Motivation: Learning Visual Features¶

The bird occupies a local area and looks the same in different parts of an image. We should construct neural networks which exploit these properties.

ANN structure for object detecion in image
- does not seem the best
- did not make use of the fact that we are dealing with images
- Spatial organization of the input is destroyed by flattening

Locality: objects tend to have a local spatial support
- fully and convolutionally connected layer → locally and convolutionally connected layer

Translation invariance: object appearance is independent of location
- Weight sharing: untis connected to different locations have the same weights
- We are not designing the kernel, but are learning the kernel from data
- i.e., We are learning visual feature extractor from data

2.2. Convolutional Operator¶

Convolution of CNN

Local connectivity
Weight sharing
Typically have sparse interactions
Convolutional Neural Networks
- Simply neural networks that use the convolution in place of general matrix multiplication in at least one of their layers

Multiple kernels

Multiple channels

2.3 Stride and Padding¶

Strides: increment step size for the convolution operator
- Reduces the size of the output map
No stride and no padding

Stride example with kernel size 3×3 and a stride of 2

Padding: artificially fill borders of image
- Useful to keep spatial dimension constant across filters
- Useful with strides and large receptive fields
- Usually fill with 0s

2.4. Nonlinear Activation Function¶

2.5. Pooling¶

Compute a maximum value in a sliding window (max pooling)
- Reduce spatial resolution for faster computation
- Achieve invariance to any permutation inside one of the cell

Pooling size : $2\times2$ for example

2.6. CNN for Classification¶

CONV and POOL layers output high-level features of input
Fully connected layer uses these features for classifying input image
Express output as probability of image belonging to a particular class

3. Lab: CNN with TensorFlow (MNIST)¶

MNIST example
To classify handwritten digits

3.1. Training¶

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()

train_x, test_x = train_x/255.0, test_x/255.0

train_x = train_x.reshape((train_x.shape[0], 28, 28, 1))
test_x = test_x.reshape((test_x.shape[0], 28, 28, 1))

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (28, 28, 1)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Conv2D(filters = 64,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (14, 14, 32)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Flatten(),

    tf.keras.layers.Dense(units = 128, activation = 'relu'),

    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = 'accuracy')

model.fit(train_x, train_y, batch_size = 50, epochs = 3)

Epoch 1/3
1200/1200 [==============================] - 10s 5ms/step - loss: 0.1365 - accuracy: 0.9581
Epoch 2/3
1200/1200 [==============================] - 5s 5ms/step - loss: 0.0442 - accuracy: 0.9863
Epoch 3/3
1200/1200 [==============================] - 5s 4ms/step - loss: 0.0300 - accuracy: 0.9906

<keras.src.callbacks.History at 0x7a7d35e49720>

3.2. Testing or Evaluating¶

test_loss, test_acc = model.evaluate(test_x, test_y)

313/313 [==============================] - 2s 4ms/step - loss: 0.0300 - accuracy: 0.9904

test_img = test_x[[1495]]

predict = model.predict(test_img, verbose = 0)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (9, 4))

plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(mypred[0]))

Prediction : 3

4. Lab: CNN with Tensorflow (Steel Surface Defects)¶

NEU steel surface defects example
To classify defects images into 6 classes

Download NEU steel surface defects images and labels

4.1. Training¶

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

# Change file paths if necessary

train_x = np.load('/content/drive/MyDrive/DL_Colab/DL_data/NEU_train_imgs.npy')
train_y = np.load('/content/drive/MyDrive/DL_Colab/DL_data/NEU_train_labels.npy')

test_x = np.load('/content/drive/MyDrive/DL_Colab/DL_data/NEU_test_imgs.npy')
test_y = np.load('/content/drive/MyDrive/DL_Colab/DL_data/NEU_test_labels.npy')

print(train_x.shape)
print(train_y.shape)

(1500, 200, 200, 1)
(1500,)

print(test_x.shape)
print(test_y.shape)

(300, 200, 200, 1)
(300,)

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (200, 200, 1)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Conv2D(filters = 64,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (100, 100, 32)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Conv2D(filters = 128,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (50, 50, 64)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Flatten(),

    tf.keras.layers.Dense(units = 128, activation = 'relu'),

    tf.keras.layers.Dense(units = 6, activation = 'softmax')
])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = 'accuracy')

model.fit(train_x, train_y, batch_size = 50, epochs = 10)

Epoch 1/10
30/30 [==============================] - 7s 63ms/step - loss: 1.7106 - accuracy: 0.2687
Epoch 2/10
30/30 [==============================] - 2s 66ms/step - loss: 1.0369 - accuracy: 0.5600
Epoch 3/10
30/30 [==============================] - 2s 69ms/step - loss: 0.5600 - accuracy: 0.8047
Epoch 4/10
30/30 [==============================] - 2s 65ms/step - loss: 0.4718 - accuracy: 0.8160
Epoch 5/10
30/30 [==============================] - 2s 63ms/step - loss: 0.3311 - accuracy: 0.8813
Epoch 6/10
30/30 [==============================] - 2s 61ms/step - loss: 0.2609 - accuracy: 0.9073
Epoch 7/10
30/30 [==============================] - 2s 57ms/step - loss: 0.2970 - accuracy: 0.8887
Epoch 8/10
30/30 [==============================] - 2s 58ms/step - loss: 0.2435 - accuracy: 0.9153
Epoch 9/10
30/30 [==============================] - 2s 58ms/step - loss: 0.1952 - accuracy: 0.9287
Epoch 10/10
30/30 [==============================] - 2s 60ms/step - loss: 0.1780 - accuracy: 0.9380

<keras.src.callbacks.History at 0x7a7d363acbb0>

4.2. Testing or Evaluating¶

test_loss, test_acc = model.evaluate(test_x, test_y)

10/10 [==============================] - 1s 30ms/step - loss: 0.2768 - accuracy: 0.9267

name = ['scratches', 'rolled-in scale', 'pitted surface', 'patches', 'inclusion', 'crazing']

idx = np.random.choice(test_x.shape[0], 1)
test_img = test_x[idx]
test_label = test_y[idx]

predict = model.predict(test_img, verbose = 0)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (9, 4))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(200, 200), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(name[mypred[0]]))
print('True Label : {}'.format(name[test_label[0]]))

Prediction : crazing
True Label : crazing

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')