Convolutional Neural Networks (CNN)

By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

0. Video Lectures¶

%%html 
<center><iframe src="https://www.youtube.com/embed/oK3Q6yCsXAg?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

%%html 
<center><iframe src="https://www.youtube.com/embed/RgsliaqKjRc?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

%%html
<center><iframe src="https://www.youtube.com/embed/UyGy4eu1u90?end=728&rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

1. Convolution¶

1.1. 1D Convolution¶

1.2. Convolution on Image (= Convolution in 2D)¶

Filter (or Kernel)

Modify or enhance an image by filtering
Filter images to emphasize certain features or remove other features
Filtering includes smoothing, sharpening and edge enhancement
Discrete convolution can be viewed as element-wise multiplication by a matrix

How to find the right Kernels

We learn many different kernels that make specific effect on images
Let’s apply an opposite approach
We are not designing the kernel, but are learning the kernel from data
Can learn feature extractor from data using a deep learning framework

2. Convolutional Neural Networks (CNN)¶

2.1. Motivation: Learning Visual Features¶

The bird occupies a local area and looks the same in different parts of an image. We should construct neural networks which exploit these properties.

ANN structure for object detecion in image
- does not seem the best
- did not make use of the fact that we are dealing with images
- Spatial organization of the input is destroyed by flattening

Locality: objects tend to have a local spatial support
- fully and convolutionally connected layer $\rightarrow$ locally and convolutionally connected layer

- __Translation invariance__: object appearance is independent of location - Weight sharing: untis connected to different locations have the same weights - We are not designing the kernel, but are learning the kernel from data - _i.e._ We are learning visual feature extractor from data

2.2. Convolutional Operator¶

Convolution of CNN

Local connectivity
Weight sharing
Typically have sparse interactions
Convolutional Neural Networks
- Simply neural networks that use the convolution in place of general matrix multiplication in at least one of their layers

Multiple channels

Multiple kernels

2.3 Stride and Padding¶

Strides: increment step size for the convolution operator
- Reduces the size of the output map

No stride and no padding

Stride example with kernel size 3×3 and a stride of 2

Padding: artificially fill borders of image
- Useful to keep spatial dimension constant across filters
- Useful with strides and large receptive fields
- Usually fill with 0s

2.4. Nonlinear Activation Function¶

2.5. Pooling¶

Compute a maximum value in a sliding window (max pooling)
- Reduce spatial resolution for faster computation
- Achieve invariance to any permutation inside one of the cell

Pooling size : $2\times2$ for example

2.6. CNN for Classification¶

CONV and POOL layers output high-level features of input
Fully connected layer uses these features for classifying input image
Express output as probability of image belonging to a particular class

3. Lab: CNN with TensorFlow (MNIST)¶

MNIST example
To classify handwritten digits

3.1. Training¶

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()

train_x, test_x = train_x/255.0, test_x/255.0

train_x = train_x.reshape((train_x.shape[0], 28, 28, 1))
test_x = test_x.reshape((test_x.shape[0], 28, 28, 1))

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32, 
                           kernel_size = (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (28, 28, 1)),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, 
                           kernel_size = (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (14, 14, 32)),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Flatten(),
    
    tf.keras.layers.Dense(units = 128, activation = 'relu'),
    
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = 'accuracy')

model.fit(train_x, train_y, batch_size = 50, epochs = 3)

Epoch 1/3
1200/1200 [==============================] - 46s 36ms/step - loss: 0.1394 - accuracy: 0.95720s - l
Epoch 2/3
1200/1200 [==============================] - 40s 33ms/step - loss: 0.0436 - accuracy: 0.9868
Epoch 3/3
1200/1200 [==============================] - 42s 35ms/step - loss: 0.0294 - accuracy: 0.9904

<tensorflow.python.keras.callbacks.History at 0x29c84722d68>

3.2. Testing or Evaluating¶

test_loss, test_acc = model.evaluate(test_x, test_y)

313/313 [==============================] - 3s 9ms/step - loss: 0.0318 - accuracy: 0.9894

test_img = test_x[[1495]]

predict = model.predict(test_img)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (12,5))

plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(mypred[0]))

Prediction : 3

4. Lab: CNN with Tensorflow (Steel Surface Defects)¶

NEU steel surface defects example
To classify defects images into 6 classes

Download NEU steel surface defects images and labels

4.1. Training¶

train_x = np.load('./data_files/NEU_train_imgs.npy')
train_y = np.load('./data_files/NEU_train_labels.npy')

test_x = np.load('./data_files/NEU_test_imgs.npy')
test_y = np.load('./data_files/NEU_test_labels.npy')

print(train_x.shape)
print(train_y.shape)

(1500, 200, 200, 1)
(1500,)

print(test_x.shape)
print(test_y.shape)

(300, 200, 200, 1)
(300,)

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32, 
                           kernel_size = (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (200, 200, 1)),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, 
                           kernel_size = (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (100, 100, 32)),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 128, 
                           kernel_size = (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (50, 50, 64)),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Flatten(),    
    
    tf.keras.layers.Dense(units = 128, activation = 'relu'),    
    
    tf.keras.layers.Dense(units = 6, activation = 'softmax')
])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = 'accuracy')

model.fit(train_x, train_y, batch_size = 50, epochs = 4)

Epoch 1/4
30/30 [==============================] - 48s 2s/step - loss: 1.8941 - accuracy: 0.2180
Epoch 2/4
30/30 [==============================] - 46s 2s/step - loss: 1.1320 - accuracy: 0.5267
Epoch 3/4
30/30 [==============================] - 45s 2s/step - loss: 0.8194 - accuracy: 0.6733
Epoch 4/4
30/30 [==============================] - 45s 2s/step - loss: 0.5473 - accuracy: 0.8000

<tensorflow.python.keras.callbacks.History at 0x29c8b7eb898>

4.2. Testing or Evaluating¶

test_loss, test_acc = model.evaluate(test_x, test_y)

10/10 [==============================] - 2s 200ms/step - loss: 0.5172 - accuracy: 0.8400

name = ['scratches', 'rolled-in scale', 'pitted surface', 'patches', 'inclusion', 'crazing']

idx = np.random.choice(test_x.shape[0], 1)
test_img = test_x[idx]
test_label = test_y[idx]

predict = model.predict(test_img)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (12,5))
plt.subplot(1,2,1)
plt.imshow(test_img.reshape(200, 200), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(name[mypred[0]]))
print('True Label : {}'.format(name[test_label[0]]))

Prediction : pitted surface
True Label : pitted surface

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')