Convolutional Neural Networks (CNN)


By Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

Table of Contents

1. Convolution

In [ ]:
from IPython.display import YouTubeVideo
YouTubeVideo('xSuFInvLjBo', width = "560", height = "315")
Out[ ]:

1.1. 1D Convolution


1.2. Convolution on Image (= Convolution in 2D)

Filter (or Kernel)

  • Modify or enhance an image by filtering

  • Filter images to emphasize certain features or remove other features

  • Filtering includes smoothing, sharpening and edge enhancement

  • Discrete convolution can be viewed as element-wise multiplication by a matrix


How to find the right Kernels

  • We learn many different kernels that make specific effect on images

  • Let’s apply an opposite approach

  • We are not designing the kernel, but are learning the kernel from data

  • Can learn feature extractor from data using a deep learning framework

2. Convolutional Neural Networks (CNN)

In [ ]:
from IPython.display import YouTubeVideo
YouTubeVideo('u03QN8lJsDg', width = "560", height = "315")
Out[ ]:

2.1. Motivation: Learning Visual Features


The bird occupies a local area and looks the same in different parts of an image. We should construct neural networks which exploit these properties.




  • ANN structure for object detecion in image

    • does not seem the best
    • did not make use of the fact that we are dealing with images
    • Spatial organization of the input is destroyed by flattening


  • Locality: objects tend to have a local spatial support
    • fully and convolutionally connected layer → locally and convolutionally connected layer


  • Translation invariance: object appearance is independent of location
    • Weight sharing: untis connected to different locations have the same weights
    • We are not designing the kernel, but are learning the kernel from data
    • i.e., We are learning visual feature extractor from data

2.2. Convolutional Operator

Convolution of CNN

  • Local connectivity

  • Weight sharing

  • Typically have sparse interactions

  • Convolutional Neural Networks

    • Simply neural networks that use the convolution in place of general matrix multiplication in at least one of their layers
  • Multiple kernels

  • Multiple channels

2.3 Stride and Padding

In [ ]:
from IPython.display import YouTubeVideo
YouTubeVideo('cr2fz5A2MXQ', width = "560", height = "315")
Out[ ]:
  • Strides: increment step size for the convolution operator

    • Reduces the size of the output map
  • No stride and no padding


  • Stride example with kernel size 3×3 and a stride of 2

  • Padding: artificially fill borders of image
    • Useful to keep spatial dimension constant across filters
    • Useful with strides and large receptive fields
    • Usually fill with 0s

2.4. Nonlinear Activation Function



2.5. Pooling

  • Compute a maximum value in a sliding window (max pooling)
    • Reduce spatial resolution for faster computation
    • Achieve invariance to any permutation inside one of the cell


  • Pooling size : $2\times2$ for example

2.6. CNN for Classification

  • CONV and POOL layers output high-level features of input
  • Fully connected layer uses these features for classifying input image
  • Express output as probability of image belonging to a particular class


3. Lab: CNN with TensorFlow (MNIST)

In [ ]:
from IPython.display import YouTubeVideo
YouTubeVideo('cr2fz5A2MXQ?si=df-ux84Q096QpMIc&start=726', width = "560", height = "315")
Out[ ]:
  • MNIST example
  • To classify handwritten digits


3.1. Training

In [ ]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
In [ ]:
mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()

train_x, test_x = train_x/255.0, test_x/255.0
In [ ]:
train_x = train_x.reshape((train_x.shape[0], 28, 28, 1))
test_x = test_x.reshape((test_x.shape[0], 28, 28, 1))
In [ ]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (28, 28, 1)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Conv2D(filters = 64,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (14, 14, 32)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Flatten(),

    tf.keras.layers.Dense(units = 128, activation = 'relu'),

    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])
In [ ]:
model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])
In [ ]:
model.fit(train_x, train_y, batch_size = 50, epochs = 3)
Epoch 1/3
1200/1200 ━━━━━━━━━━━━━━━━━━━━ 6s 3ms/step - accuracy: 0.9059 - loss: 0.3076
Epoch 2/3
1200/1200 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.9866 - loss: 0.0444
Epoch 3/3
1200/1200 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.9920 - loss: 0.0264
Out[ ]:
<keras.src.callbacks.history.History at 0x7932f7fbc7c0>

3.2. Testing or Evaluating

In [ ]:
test_loss, test_acc = model.evaluate(test_x, test_y)
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9837 - loss: 0.0555
In [ ]:
test_img = test_x[[1495]]

predict = model.predict(test_img, verbose = 0)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (9, 4))

plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(mypred[0]))
No description has been provided for this image
Prediction : 3

4. Lab: CNN with Tensorflow (Steel Surface Defects)

  • NEU steel surface defects example
  • To classify defects images into 6 classes


Download NEU steel surface defects images and labels

4.1. Training

In [ ]:
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
In [ ]:
# Change file paths if necessary

train_x = np.load('/content/drive/MyDrive/DL/DL_data/NEU_train_imgs.npy')
train_y = np.load('/content/drive/MyDrive/DL/DL_data/NEU_train_labels.npy')

test_x = np.load('/content/drive/MyDrive/DL/DL_data/NEU_test_imgs.npy')
test_y = np.load('/content/drive/MyDrive/DL/DL_data/NEU_test_labels.npy')
In [ ]:
print(train_x.shape)
print(train_y.shape)
(1500, 200, 200, 1)
(1500,)
In [ ]:
print(test_x.shape)
print(test_y.shape)
(300, 200, 200, 1)
(300,)
In [ ]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (200, 200, 1)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Conv2D(filters = 64,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (100, 100, 32)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Conv2D(filters = 128,
                           kernel_size = (3,3),
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (50, 50, 64)),

    tf.keras.layers.MaxPool2D((2,2)),

    tf.keras.layers.Flatten(),

    tf.keras.layers.Dense(units = 128, activation = 'relu'),

    tf.keras.layers.Dense(units = 6, activation = 'softmax')
])
In [ ]:
model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])
In [ ]:
model.fit(train_x, train_y, batch_size = 50, epochs = 10)
Epoch 1/10
30/30 ━━━━━━━━━━━━━━━━━━━━ 4s 58ms/step - accuracy: 0.2180 - loss: 1.7870
Epoch 2/10
30/30 ━━━━━━━━━━━━━━━━━━━━ 2s 56ms/step - accuracy: 0.6771 - loss: 0.8989
Epoch 3/10
30/30 ━━━━━━━━━━━━━━━━━━━━ 2s 56ms/step - accuracy: 0.8274 - loss: 0.4721
Epoch 4/10
30/30 ━━━━━━━━━━━━━━━━━━━━ 2s 56ms/step - accuracy: 0.8898 - loss: 0.3322
Epoch 5/10
30/30 ━━━━━━━━━━━━━━━━━━━━ 2s 57ms/step - accuracy: 0.8632 - loss: 0.3320
Epoch 6/10
30/30 ━━━━━━━━━━━━━━━━━━━━ 3s 59ms/step - accuracy: 0.9026 - loss: 0.2565
Epoch 7/10
30/30 ━━━━━━━━━━━━━━━━━━━━ 2s 56ms/step - accuracy: 0.9475 - loss: 0.1591
Epoch 8/10
30/30 ━━━━━━━━━━━━━━━━━━━━ 2s 57ms/step - accuracy: 0.9362 - loss: 0.1698
Epoch 9/10
30/30 ━━━━━━━━━━━━━━━━━━━━ 3s 56ms/step - accuracy: 0.9437 - loss: 0.1645
Epoch 10/10
30/30 ━━━━━━━━━━━━━━━━━━━━ 3s 56ms/step - accuracy: 0.9547 - loss: 0.1317
Out[ ]:
<keras.src.callbacks.history.History at 0x7932f846d3c0>

4.2. Testing or Evaluating

In [ ]:
test_loss, test_acc = model.evaluate(test_x, test_y)
10/10 ━━━━━━━━━━━━━━━━━━━━ 1s 63ms/step - accuracy: 0.8135 - loss: 0.4512
In [ ]:
name = ['scratches', 'rolled-in scale', 'pitted surface', 'patches', 'inclusion', 'crazing']

idx = np.random.choice(test_x.shape[0], 1)
test_img = test_x[idx]
test_label = test_y[idx]

predict = model.predict(test_img, verbose = 0)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (9, 4))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(200, 200), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(name[mypred[0]]))
print('True Label : {}'.format(name[test_label[0]]))
No description has been provided for this image
Prediction : pitted surface
True Label : pitted surface
In [ ]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')