Convolutional Neural Networks (CNN)

By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Convolution¶

1.1. 1D Convolution¶

1.2. Convolution on Image (= Convolution in 2D)¶

Filter (or Kernel)

Modify or enhance an image by filtering
Filter images to emphasize certain features or remove other features
Filtering includes smoothing, sharpening and edge enhancement
Discrete convolution can be viewed as element-wise multiplication by a matrix

How to find the right Kernels

We learn many different kernels that make specific effect on images
Let’s apply an opposite approach
We are not designing the kernel, but are learning the kernel from data
Can learn feature extractor from data using a deep learning framework

2. Convolutional Neural Networks (CNN)¶

2.1. Motivation: Learning Visual Features¶

The bird occupies a local area and looks the same in different parts of an image. We should construct neural networks which exploit these properties.

ANN structure for object detecion in image
- does not seem the best
- did not make use of the fact that we are dealing with images
- Spatial organization of the input is destroyed by flattening

Locality: objects tend to have a local spatial support
- fully and convolutionally connected layer $\rightarrow$ locally and convolutionally connected layer

- __Translation invariance__: object appearance is independent of location - Weight sharing: untis connected to different locations have the same weights - We are not designing the kernel, but are learning the kernel from data - _i.e._ We are learning visual feature extractor from data

2.2. Convolutional Operator¶

Convolution of CNN

Local connectivity
Weight sharing
Typically have sparse interactions
Convolutional Neural Networks
- Simply neural networks that use the convolution in place of general matrix multiplication in at least one of their layers

Multiple channels

Multiple kernels

2.3 Stride and Padding¶

Strides: increment step size for the convolution operator
- Reduces the size of the output map

No stride and no padding

Stride example with kernel size 3×3 and a stride of 2

Padding: artificially fill borders of image
- Useful to keep spatial dimension constant across filters
- Useful with strides and large receptive fields
- Usually fill with 0s

2.4. Nonlinear Activation Function¶

2.5. Pooling¶

Compute a maximum value in a sliding window (max pooling)
- Reduce spatial resolution for faster computation
- Achieve invariance to any permutation inside one of the cell

Pooling size : $2\times2$ for example

2.6. CNN for Classification¶

CONV and POOL layers output high-level features of input
Fully connected layer uses these features for classifying input image
Express output as probability of image belonging to a particular class

3. Lab: CNN with TensorFlow (MNIST)¶

MNIST example
To classify handwritten digits

3.1. Training¶

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()

train_x, test_x = train_x/255.0, test_x/255.0

train_x = train_x.reshape((train_x.shape[0], 28, 28, 1))
test_x = test_x.reshape((test_x.shape[0], 28, 28, 1))

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32, 
                           kernel_size = (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (28, 28, 1)),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, 
                           kernel_size = (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (14, 14, 32)),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Flatten(),
    
    tf.keras.layers.Dense(units = 128, activation = 'relu'),
    
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])

model.fit(train_x, train_y, epochs = 3)

Epoch 1/3
1875/1875 [==============================] - 23s 12ms/step - loss: 0.1257 - accuracy: 0.9614
Epoch 2/3
1875/1875 [==============================] - 25s 13ms/step - loss: 0.0411 - accuracy: 0.9871
Epoch 3/3
1875/1875 [==============================] - 24s 13ms/step - loss: 0.0266 - accuracy: 0.9918

<tensorflow.python.keras.callbacks.History at 0x21f8d671c50>

3.2. Testing or Evaluating¶

test_loss, test_acc = model.evaluate(test_x, test_y)

313/313 [==============================] - 1s 3ms/step - loss: 0.0266 - accuracy: 0.9907

test_img = test_x[[1495]]

predict = model.predict_on_batch(test_img)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (12,5))

plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(mypred[0]))

Prediction : 3

4. Lab: CNN with Tensorflow (Steel Surface Defects)¶

NEU steel surface defects example
To classify defects images into 6 classes

Download NEU steel surface defects images and labels

4.1. Training¶

train_x = np.load('./data_files/NEU_train_imgs.npy')
train_y = np.load('./data_files/NEU_train_labels.npy')
test_x = np.load('./data_files/NEU_test_imgs.npy')
test_y = np.load('./data_files/NEU_test_labels.npy')

n_train = train_x.shape[0]
n_test = test_x.shape[0]

print ("The number of training images : {}, shape : {}".format(n_train, train_x.shape))
print ("The number of testing images : {}, shape : {}".format(n_test, test_x.shape))

The number of training images : 1500, shape : (1500, 200, 200, 1)
The number of testing images : 300, shape : (300, 200, 200, 1)

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32, 
                           kernel_size = (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (200, 200, 1)),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, 
                           kernel_size = (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (100, 100, 32)),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 128, 
                           kernel_size = (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (50, 50, 64)),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Flatten(),    
    tf.keras.layers.Dense(units = 128, activation = 'relu'),    
    tf.keras.layers.Dense(units = 6, activation = 'softmax')
])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])

model.fit(train_x, train_y, epochs = 4)

Epoch 1/4
47/47 [==============================] - 31s 655ms/step - loss: 1.7424 - accuracy: 0.2793
Epoch 2/4
47/47 [==============================] - 30s 641ms/step - loss: 1.0081 - accuracy: 0.6240
Epoch 3/4
47/47 [==============================] - 29s 614ms/step - loss: 0.6069 - accuracy: 0.7853
Epoch 4/4
47/47 [==============================] - 30s 629ms/step - loss: 0.4119 - accuracy: 0.8447

<tensorflow.python.keras.callbacks.History at 0x21f94715668>

4.2. Testing or Evaluating¶

test_loss, test_acc = model.evaluate(test_x, test_y)

10/10 [==============================] - 2s 144ms/step - loss: 0.3800 - accuracy: 0.8633

name = ['scratches', 'rolled-in scale', 'pitted surface', 'patches', 'inclusion', 'crazing']

idx = np.random.choice(test_x.shape[0], 1)
test_img = test_x[idx]
test_label = test_y[idx]

predict = model.predict_on_batch(test_img)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (12,5))

plt.subplot(1,2,1)
plt.imshow(test_img.reshape(200, 200), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(name[mypred[0]]))
print('True Label : {}'.format(name[test_label[0]]))

Prediction : scratches
True Label : scratches

5. Video Lectures¶

%%html
<center><iframe src="https://www.youtube.com/embed/MHbgEOAbywA?rel=0" 
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>

%%html
<center><iframe src="https://www.youtube.com/embed/5zQgad2ukik?rel=0" 
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>

%%html
<center><iframe src="https://www.youtube.com/embed/YC-aHDmAe_g?rel=0" 
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')