Convolutional Neural Networks (CNN)


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Convolution

1.1. 1D Convolution


1.2. Convolution on Image (= Convolution in 2D)

Filter (or Kernel)

  • Modify or enhance an image by filtering
  • Filter images to emphasize certain features or remove other features
  • Filtering includes smoothing, sharpening and edge enhancement

  • Discrete convolution can be viewed as element-wise multiplication by a matrix


How to find the right Kernels

  • We learn many different kernels that make specific effect on images

  • Let’s apply an opposite approach

  • We are not designing the kernel, but are learning the kernel from data

  • Can learn feature extractor from data using a deep learning framework

2. Convolutional Neural Networks (CNN)

2.1. Motivation: Learning Visual Features


The bird occupies a local area and looks the same in different parts of an image. We should construct neural networks which exploit these properties.



  • ANN structure for object detecion in image

    • does not seem the best
    • did not make use of the fact that we are dealing with images
    • Spatial organization of the input is destroyed by flattening



  • Locality: objects tend to have a local spatial support
    • fully and convolutionally connected layer $\rightarrow$ locally and convolutionally connected layer



- __Translation invariance__: object appearance is independent of location - Weight sharing: untis connected to different locations have the same weights - We are not designing the kernel, but are learning the kernel from data - _i.e._ We are learning visual feature extractor from data

2.2. Convolutional Operator

Convolution of CNN

  • Local connectivity
  • Weight sharing
  • Typically have sparse interactions

  • Convolutional Neural Networks

    • Simply neural networks that use the convolution in place of general matrix multiplication in at least one of their layers
  • Multiple channels


  • Multiple kernels


2.3 Stride and Padding

  • Strides: increment step size for the convolution operator
    • Reduces the size of the output map
  • No stride and no padding


  • Stride example with kernel size 3×3 and a stride of 2


  • Padding: artificially fill borders of image
    • Useful to keep spatial dimension constant across filters
    • Useful with strides and large receptive fields
    • Usually fill with 0s


2.4. Nonlinear Activation Function


2.5. Pooling

  • Compute a maximum value in a sliding window (max pooling)
    • Reduce spatial resolution for faster computation
    • Achieve invariance to any permutation inside one of the cell


  • Pooling size : $2\times2$ for example

2.6. CNN for Classification

  • CONV and POOL layers output high-level features of input
  • Fully connected layer uses these features for classifying input image
  • Express output as probability of image belonging to a particular class



3. Lab: CNN with TensorFlow

  • MNIST example
  • To classify handwritten digits



3.1. Training

In [ ]:
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
In [ ]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
In [ ]:
mnist = tf.keras.datasets.mnist

(train_x, train_y), (test_x, test_y) = mnist.load_data()

train_x, test_x = train_x/255.0, test_x/255.0
train_x = train_x.reshape((train_x.shape[0], 28, 28, 1))
test_x = test_x.reshape((test_x.shape[0], 28, 28, 1))
In [ ]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, 
                           (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (28, 28, 1)),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(64, 
                           (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (14, 14, 32)),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Flatten(),
    
    tf.keras.layers.Dense(128, activation = 'relu'),
    
    tf.keras.layers.Dense(10, activation = 'softmax')
])
In [ ]:
model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])
In [ ]:
model.fit(train_x, train_y)
1875/1875 [==============================] - 8s 4ms/step - loss: 0.1210 - accuracy: 0.9623
Out[ ]:
<tensorflow.python.keras.callbacks.History at 0x2b5d6428348>

3.2. Testing or Evaluating

In [ ]:
test_loss, test_acc = model.evaluate(test_x, test_y)

print('loss = {}, Accuracy = {} %'.format(round(test_loss,2), round(test_acc*100)))
313/313 [==============================] - 1s 3ms/step - loss: 0.0339 - accuracy: 0.9889
loss = 0.03, Accuracy = 99 %
In [ ]:
test_img = test_x[np.random.choice(test_x.shape[0], 1)]

predict = model.predict_on_batch(test_img)
mypred = np.argmax(predict, axis = 1)

plt.figure(figsize = (12,5))

plt.subplot(1,2,1)
plt.imshow(test_img.reshape(28, 28), 'gray')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(predict[0])
plt.show()

print('Prediction : {}'.format(mypred[0]))
C:\Users\user\tensorflow2\lib\site-packages\ipykernel_launcher.py:12: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True.
  if sys.path[0] == '':
Prediction : 2

4. CNN on STFT Images

  • Binary Classification
  • Acoustic data
  • Class A, B

4.1. Data

  • File format: pkl

  • Information: time, signal, label

  • Load as dictionary type

  • Labels based on one-hot encoding

    • A: [1,0]
    • B: [0,1]
In [ ]:
from scipy.signal import spectrogram
from six.moves import cPickle

data = cPickle.load(open('./data_files/data.pkl','rb'))

# data saved as dictionary
print(data.keys())

# 2000 vibration signals
print(len(data['time']))

# 10000 points in each signal
print(len(data['time'][0]))
print(len(data['signal'][0]))

# binary classes
print(len(data['label'][0]))
dict_keys(['label', 'time', 'signal'])
2000
10000
10000
2
In [ ]:
# plot ramdonly selected three vibration signals

N = 2000

for _ in range(3):
    idx = np.random.randint(N)   
    
    plt.plot(data['time'][idx], data['signal'][idx])
    plt.title(np.argmax(data['label'][idx]))
    plt.show()
In [ ]:
# from vibration signal in 1D to STFT image in 2D

Fs = 12800

idx = np.random.randint(N)
x = data['signal'][idx]

f, t, Sxx = spectrogram(x, 
                        Fs, 
                        scaling = 'spectrum', 
                        mode = 'magnitude')

plt.figure(figsize = (6, 6))
plt.pcolormesh(t, f, Sxx)
plt.title(np.argmax(data['label'][idx]))
plt.ylabel('Frequency [Hz]', fontsize = 15)
plt.xlabel('Time [sec]', fontsize = 15)
plt.show()
In [ ]:
# Convert all data to STFT image in 2D

stft_data = []

for i in range(N):    
    f, t, Sxx = spectrogram(data['signal'][i], 
                        Fs, 
                        scaling = 'spectrum', 
                        mode = 'magnitude')
    
    stft_data.append(Sxx)

np.shape(stft_data)    
Out[ ]:
(2000, 129, 44)
In [ ]:
from sklearn.model_selection import train_test_split

# train: test = 0.7: 0.3
train_x, test_x, train_y, test_y = train_test_split(stft_data, data['label'], test_size = 0.3)

# data type and shape conversion 
train_x = np.asarray(train_x)
train_y = np.asarray(train_y)
train_x = np.expand_dims(train_x, 3)
train_y = np.argmax(train_y, axis = 1)

test_x = np.asarray(test_x)
test_y = np.asarray(test_y)
test_x = np.expand_dims(test_x, 3)
test_y = np.argmax(test_y, axis = 1)

4.2. STFT to CNN

  • Simple CNN model

    • input: $129 \times 44$, 1 ch

    • ouput: class A or B


In [ ]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(16, (3,3), activation='relu',
                           padding = 'SAME',
                           input_shape = (129, 44, 1)),    
    tf.keras.layers.Conv2D(16, (3,3), activation = 'relu',
                           padding = 'SAME',
                           input_shape = (129, 44, 16)),
    tf.keras.layers.MaxPool2D((2,2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation = 'relu'),    
    tf.keras.layers.Dense(2, activation = 'softmax')
])
In [ ]:
model.compile(optimizer = 'adam',
              loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'])
In [ ]:
model.fit(train_x, train_y, epochs = 20)
Epoch 1/20
44/44 [==============================] - 1s 18ms/step - loss: 0.6936 - accuracy: 0.5093
Epoch 2/20
44/44 [==============================] - 1s 14ms/step - loss: 0.6919 - accuracy: 0.5221
Epoch 3/20
44/44 [==============================] - 1s 14ms/step - loss: 0.6858 - accuracy: 0.5536
Epoch 4/20
44/44 [==============================] - 1s 14ms/step - loss: 0.6518 - accuracy: 0.6250
Epoch 5/20
44/44 [==============================] - 1s 15ms/step - loss: 0.5659 - accuracy: 0.7514
Epoch 6/20
44/44 [==============================] - 1s 15ms/step - loss: 0.5294 - accuracy: 0.7400
Epoch 7/20
44/44 [==============================] - 1s 16ms/step - loss: 0.4847 - accuracy: 0.7586
Epoch 8/20
44/44 [==============================] - 1s 14ms/step - loss: 0.4489 - accuracy: 0.7879
Epoch 9/20
44/44 [==============================] - 1s 14ms/step - loss: 0.4257 - accuracy: 0.7979 0s - loss: 0.4283 - accuracy: 0.
Epoch 10/20
44/44 [==============================] - 1s 15ms/step - loss: 0.4166 - accuracy: 0.8121
Epoch 11/20
44/44 [==============================] - 1s 16ms/step - loss: 0.3969 - accuracy: 0.8143 0s - loss: 0.3959 - 
Epoch 12/20
44/44 [==============================] - 1s 16ms/step - loss: 0.3651 - accuracy: 0.8293
Epoch 13/20
44/44 [==============================] - 1s 15ms/step - loss: 0.3623 - accuracy: 0.8364
Epoch 14/20
44/44 [==============================] - 1s 14ms/step - loss: 0.3855 - accuracy: 0.8207
Epoch 15/20
44/44 [==============================] - 1s 14ms/step - loss: 0.3774 - accuracy: 0.8364
Epoch 16/20
44/44 [==============================] - 1s 15ms/step - loss: 0.4055 - accuracy: 0.8093
Epoch 17/20
44/44 [==============================] - 1s 15ms/step - loss: 0.4166 - accuracy: 0.8114
Epoch 18/20
44/44 [==============================] - 1s 15ms/step - loss: 0.3596 - accuracy: 0.8386
Epoch 19/20
44/44 [==============================] - 1s 15ms/step - loss: 0.3667 - accuracy: 0.8229
Epoch 20/20
44/44 [==============================] - 1s 14ms/step - loss: 0.3447 - accuracy: 0.8393 0s - loss: 0.3353 - accuracy
Out[ ]:
<tensorflow.python.keras.callbacks.History at 0x2b5eca64448>
In [ ]:
test_loss, test_acc = model.evaluate(test_x,  test_y)

print('loss = {}, Accuracy = {} %'.format(round(test_loss, 8), round(test_acc * 100)))
19/19 [==============================] - 0s 5ms/step - loss: 0.3499 - accuracy: 0.8450
loss = 0.3498885, Accuracy = 85 %
In [ ]:
test_img = test_x[np.random.choice(test_x.shape[0], 1)]

pred_ohe = model.predict_on_batch(test_img)
pred = np.argmax(pred_ohe, axis = 1)

plt.figure(figsize = (12,5))

plt.subplot(1,2,1)
plt.imshow(test_img.reshape(129, 44), 'jet', origin = 'lower')
plt.axis('off')
plt.subplot(1,2,2)
plt.stem(pred_ohe[0])
plt.xticks([0,1])
plt.show()

print('Prediction : {}'.format(pred[0]))
print('Probability : {}'.format(pred_ohe[0]))
C:\Users\user\tensorflow2\lib\site-packages\ipykernel_launcher.py:12: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True.
  if sys.path[0] == '':
Prediction : 0
Probability : [0.9719915  0.02800849]
In [ ]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')