Convolutional Neural Networks (CNN)

By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Convolution¶

1.1. 1D Convolution¶

1.2. Convolution on Image (= Convolution in 2D)¶

Filter (or Kernel)

Modify or enhance an image by filtering
Filter images to emphasize certain features or remove other features
Filtering includes smoothing, sharpening and edge enhancement
Discrete convolution can be viewed as element-wise multiplication by a matrix

How to find the right Kernels

We learn many different kernels that make specific effect on images
Let’s apply an opposite approach
We are not designing the kernel, but are learning the kernel from data
Can learn feature extractor from data using a deep learning framework

2. Convolutional Neural Networks (CNN)¶

2.1. Motivation: Learning Visual Features¶

The bird occupies a local area and looks the same in different parts of an image. We should construct neural networks which exploit these properties.

ANN structure for object detecion in image
- does not seem the best
- did not make use of the fact that we are dealing with images
- Spatial organization of the input is destroyed by flattening

Locality: objects tend to have a local spatial support
- fully and convolutionally connected layer $\rightarrow$ locally and convolutionally connected layer

- __Translation invariance__: object appearance is independent of location - Weight sharing: untis connected to different locations have the same weights - We are not designing the kernel, but are learning the kernel from data - _i.e._ We are learning visual feature extractor from data

2.2. Convolutional Operator¶

Convolution of CNN

Local connectivity
Weight sharing
Typically have sparse interactions
Convolutional Neural Networks
- Simply neural networks that use the convolution in place of general matrix multiplication in at least one of their layers

Multiple channels

Multiple kernels

2.3 Stride and Padding¶

Strides: increment step size for the convolution operator
- Reduces the size of the output map

No stride and no padding

Stride example with kernel size 3×3 and a stride of 2

Padding: artificially fill borders of image
- Useful to keep spatial dimension constant across filters
- Useful with strides and large receptive fields
- Usually fill with 0s

2.4. Nonlinear Activation Function¶

2.5. Pooling¶

Compute a maximum value in a sliding window (max pooling)
- Reduce spatial resolution for faster computation
- Achieve invariance to any permutation inside one of the cell

Pooling size : $2\times2$ for example

2.6. CNN for Classification¶

CONV and POOL layers output high-level features of input
Fully connected layer uses these features for classifying input image
Express output as probability of image belonging to a particular class

3. Lab: CNN with TensorFlow¶

MNIST example
To classify handwritten digits

3.1. Training¶

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()

train_x, test_x = train_x/255.0, test_x/255.0
train_x = train_x.reshape((train_x.shape[0], 28, 28, 1))
test_x = test_x.reshape((test_x.shape[0], 28, 28, 1))

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, 
                           (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (28, 28, 1)),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(64, 
                           (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (14, 14, 32)),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Flatten(),
    
    tf.keras.layers.Dense(128, activation = 'relu'),
    
    tf.keras.layers.Dense(10, activation = 'softmax')
])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])

model.fit(train_x, train_y)

1875/1875 [==============================] - 8s 4ms/step - loss: 0.1210 - accuracy: 0.9623

<tensorflow.python.keras.callbacks.History at 0x2b5d6428348>

3.2. Testing or Evaluating¶

test_loss, test_acc = model.evaluate(test_x, test_y)

print('loss = {}, Accuracy = {} %'.format(round(test_loss,2), round(test_acc*100)))

313/313 [==============================] - 1s 3ms/step - loss: 0.0339 - accuracy: 0.9889
loss = 0.03, Accuracy = 99 %

test_img = test_x[np.random.choice(test_x.shape[0], 1)]

predict = model.predict_on_batch(test_img)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (12,5))

plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(mypred[0]))

C:\Users\user\tensorflow2\lib\site-packages\ipykernel_launcher.py:12: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True.
  if sys.path[0] == '':

Prediction : 2

4. CNN on STFT Images¶

Binary Classification
Acoustic data
Class A, B

4.1. Data¶

File format: pkl
Information: time, signal, label
Load as dictionary type
Labels based on one-hot encoding
- A: [1,0]
- B: [0,1]

Download

from scipy.signal import spectrogram
from six.moves import cPickle

data = cPickle.load(open('./data_files/data.pkl','rb'))

# data saved as dictionary
print(data.keys())

# 2000 vibration signals
print(len(data['time']))

# 10000 points in each signal
print(len(data['time'][0]))
print(len(data['signal'][0]))

# binary classes
print(len(data['label'][0]))

dict_keys(['label', 'time', 'signal'])
2000
10000
10000
2

# plot ramdonly selected three vibration signals

N = 2000

for _ in range(3):
    idx = np.random.randint(N)   
    
    plt.plot(data['time'][idx], data['signal'][idx])
    plt.title(np.argmax(data['label'][idx]))
    plt.show()

# from vibration signal in 1D to STFT image in 2D

Fs = 12800

idx = np.random.randint(N)
x = data['signal'][idx]

f, t, Sxx = spectrogram(x, 
                        Fs, 
                        scaling = 'spectrum', 
                        mode = 'magnitude')

plt.figure(figsize = (6, 6))
plt.pcolormesh(t, f, Sxx)
plt.title(np.argmax(data['label'][idx]))
plt.ylabel('Frequency [Hz]', fontsize = 15)
plt.xlabel('Time [sec]', fontsize = 15)
plt.show()

# Convert all data to STFT image in 2D

stft_data = []

for i in range(N):    
    f, t, Sxx = spectrogram(data['signal'][i], 
                        Fs, 
                        scaling = 'spectrum', 
                        mode = 'magnitude')
    
    stft_data.append(Sxx)

np.shape(stft_data)

(2000, 129, 44)

from sklearn.model_selection import train_test_split

# train: test = 0.7: 0.3
train_x, test_x, train_y, test_y = train_test_split(stft_data, data['label'], test_size = 0.3)

# data type and shape conversion 
train_x = np.asarray(train_x)
train_y = np.asarray(train_y)
train_x = np.expand_dims(train_x, 3)
train_y = np.argmax(train_y, axis = 1)

test_x = np.asarray(test_x)
test_y = np.asarray(test_y)
test_x = np.expand_dims(test_x, 3)
test_y = np.argmax(test_y, axis = 1)

4.2. STFT to CNN¶

Simple CNN model
- input: $129 \times 44$ , 1 ch
- ouput: class A or B

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(16, (3,3), activation='relu',
                           padding = 'SAME',
                           input_shape = (129, 44, 1)),    
    tf.keras.layers.Conv2D(16, (3,3), activation = 'relu',
                           padding = 'SAME',
                           input_shape = (129, 44, 16)),
    tf.keras.layers.MaxPool2D((2,2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation = 'relu'),    
    tf.keras.layers.Dense(2, activation = 'softmax')
])

model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])

model.fit(train_x, train_y, epochs = 20)

Epoch 1/20
44/44 [==============================] - 1s 18ms/step - loss: 0.6936 - accuracy: 0.5093
Epoch 2/20
44/44 [==============================] - 1s 14ms/step - loss: 0.6919 - accuracy: 0.5221
Epoch 3/20
44/44 [==============================] - 1s 14ms/step - loss: 0.6858 - accuracy: 0.5536
Epoch 4/20
44/44 [==============================] - 1s 14ms/step - loss: 0.6518 - accuracy: 0.6250
Epoch 5/20
44/44 [==============================] - 1s 15ms/step - loss: 0.5659 - accuracy: 0.7514
Epoch 6/20
44/44 [==============================] - 1s 15ms/step - loss: 0.5294 - accuracy: 0.7400
Epoch 7/20
44/44 [==============================] - 1s 16ms/step - loss: 0.4847 - accuracy: 0.7586
Epoch 8/20
44/44 [==============================] - 1s 14ms/step - loss: 0.4489 - accuracy: 0.7879
Epoch 9/20
44/44 [==============================] - 1s 14ms/step - loss: 0.4257 - accuracy: 0.7979 0s - loss: 0.4283 - accuracy: 0.
Epoch 10/20
44/44 [==============================] - 1s 15ms/step - loss: 0.4166 - accuracy: 0.8121
Epoch 11/20
44/44 [==============================] - 1s 16ms/step - loss: 0.3969 - accuracy: 0.8143 0s - loss: 0.3959 - 
Epoch 12/20
44/44 [==============================] - 1s 16ms/step - loss: 0.3651 - accuracy: 0.8293
Epoch 13/20
44/44 [==============================] - 1s 15ms/step - loss: 0.3623 - accuracy: 0.8364
Epoch 14/20
44/44 [==============================] - 1s 14ms/step - loss: 0.3855 - accuracy: 0.8207
Epoch 15/20
44/44 [==============================] - 1s 14ms/step - loss: 0.3774 - accuracy: 0.8364
Epoch 16/20
44/44 [==============================] - 1s 15ms/step - loss: 0.4055 - accuracy: 0.8093
Epoch 17/20
44/44 [==============================] - 1s 15ms/step - loss: 0.4166 - accuracy: 0.8114
Epoch 18/20
44/44 [==============================] - 1s 15ms/step - loss: 0.3596 - accuracy: 0.8386
Epoch 19/20
44/44 [==============================] - 1s 15ms/step - loss: 0.3667 - accuracy: 0.8229
Epoch 20/20
44/44 [==============================] - 1s 14ms/step - loss: 0.3447 - accuracy: 0.8393 0s - loss: 0.3353 - accuracy

<tensorflow.python.keras.callbacks.History at 0x2b5eca64448>

test_loss, test_acc = model.evaluate(test_x,  test_y)

print('loss = {}, Accuracy = {} %'.format(round(test_loss, 8), round(test_acc * 100)))

19/19 [==============================] - 0s 5ms/step - loss: 0.3499 - accuracy: 0.8450
loss = 0.3498885, Accuracy = 85 %

test_img = test_x[np.random.choice(test_x.shape[0], 1)]

pred_ohe = model.predict_on_batch(test_img)
pred = np.argmax(pred_ohe, axis = 1)

plt.figure(figsize = (12,5))

plt.subplot(1,2,1)
plt.imshow(test_img.reshape(129, 44), 'jet', origin = 'lower')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(pred_ohe[0])
plt.xticks([0,1])
plt.show()

print('Prediction : {}'.format(pred[0]))
print('Probability : {}'.format(pred_ohe[0]))

C:\Users\user\tensorflow2\lib\site-packages\ipykernel_launcher.py:12: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True.
  if sys.path[0] == '':

Prediction : 0
Probability : [0.9719915  0.02800849]

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')