Object Detection


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. 2D ConvolutionĀ¶

tf.keras.layers.Conv2D(filters, kernel_size, strides, padding, activation, kernel_regularizer, input_shape)
    filters = 32 
    kernel_size = (3,3)
    strides = (1,1)
    padding = 'SAME'
    activeation='relu'
    kernel_regularizer=tf.keras.regularizers.l2(0.04)
    input_shape = tensor of shape([input_h, input_w, input_ch])


  • filter size
    • the number of channels.
  • kernel_size

    • the height and width of the 2D convolution window.
  • stride

    • the step size of the kernel when traversing the image.
  • padding

    • how the border of a sample is handled.
    • A padded convolution will keep the spatial output dimensions equal to the input, whereas unpadded convolutions will crop away some of the borders if the kernel is larger than 1.
    • 'SAME' : enable zero padding
    • 'VALID' : disable zero padding
  • activation
    • Activation function to use.
  • kernel_regularizer

    • Initializer for the kernel weights matrix.
  • input and output channels

    • A convolutional layer takes a certain number of input channels ($C$) and calculates a specific number of output channels ($D$).

Examples


input = [None, 4, 4, 1]
filter size = [3, 3, 1, 1]
strides = [1, 1, 1, 1]
padding = 'VALID'

input = [None, 5, 5, 1]
filter size = [3, 3, 1, 1]
strides = [1, 1, 1, 1]
padding = 'SAME'

2. Object DetectionĀ¶



2.1. Localization MethodsĀ¶

  • Histogram of oriented Gradients (HOG) with SVM


  • Selective search


3. Object Detection AlgorithmsĀ¶



3.1. One-stage Object DetectionĀ¶

  • YOLO



  • SSD



3.2. Two-stage Object DetectionĀ¶

  • R-CNN

  • Faster R-CNN

  • Mask R-CNN

4. ExamplesĀ¶

InĀ [1]:
%%html
<center><iframe 
width="560" height="315" src="https://www.youtube.com/embed/Cgxsv1riJhI" frameborder="0" allowfullscreen>
</iframe><center>
InĀ [2]:
%%html
<center><iframe 
width="560" height="315" src="https://www.youtube.com/embed/vRqSO6RsptU" frameborder="0">
</iframe><center>

5. Object Detection with Machinery Parts DatasetĀ¶

  • Simplified version of two-stage object detection model for tutorial



  • 2-D convolution layers extract features from the input image.
  • Extracted features are utilized for object bounding box detection and object classification
  • Both classifier and bounding box regressor share the same features acuired from the 2-D convolution layers

5.1. Import LibraryĀ¶

InĀ [3]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches

%matplotlib inline 
InĀ [4]:
train_imgs = np.load('data_files/object_detction_trn_data.npy')
train_labels = np.load('data_files/object_detction_trn_label.npy')

test_imgs = np.load('data_files/object_detction_eval_data.npy')
test_labels = np.load('data_files/object_detction_eval_label.npy')

# input image: 240 by 320
# output label: class, x, y, h, w

classes = ['Axis',
           'Bearing',
           'Bearing_Box',
           'Distance_Tube',
           'F20_20_B']
InĀ [5]:
print(train_imgs.shape)
print(train_labels.shape)
print(test_imgs.shape)
print(test_labels.shape)
(240, 240, 320, 3)
(240, 5)
(60, 240, 320, 3)
(60, 5)
  • Five classes images are availabe: Axis, bearing, bearing box, distance tube, beam

  • 250 images are used for training (50 images per class)

  • 50 images are avalialbe for evaluation (10 images per class)

  • One object per image (240 by 320)

  • Labeled with class and bounding box location(normalizsed): class, $x, y, h, \omega$

InĀ [6]:
idx = 138

train_img = train_imgs[idx]
c, x, y, h, w = train_labels[idx]

# rescaling 
x, w = x*320, w*320
y, h = y*240, h*240

rect = patches.Rectangle((x, y), 
                         w,
                         h, 
                         linewidth = 2, 
                         edgecolor = 'r', 
                         facecolor = 'none')

fig, ax = plt.subplots(figsize = (8,8))
plt.title(classes[int(c)])
plt.imshow(train_img)
ax.add_patch(rect)
plt.axis('off')
plt.show()
InĀ [7]:
# rescaling output labels

train_labels = np.multiply(train_labels, [1, 320, 240, 320, 240])
test_labels = np.multiply(test_labels, [1, 320, 240, 320, 240])

5.3. Define and Build an Object Detection ModelĀ¶




InĀ [8]:
feature_extractor = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32, 
                           kernel_size = (3,3), 
                           activation = 'relu',
                           padding = 'SAME',
                           input_shape = (240, 320, 3)),
    
    tf.keras.layers.MaxPool2D(pool_size = (2,2)),
    
    tf.keras.layers.Conv2D(64, (3,3), activation = 'relu', padding = 'SAME'),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(64, (3,3), activation = 'relu', padding = 'SAME'),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(128, (3,3), activation = 'relu', padding = 'SAME'),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(128, (3,3), activation = 'relu', padding = 'SAME'),
    
    tf.keras.layers.MaxPool2D((2,2)),
    
    tf.keras.layers.Conv2D(256, (3,3), activation = 'relu', padding = 'SAME'),
    
    tf.keras.layers.GlobalAveragePooling2D()
])
WARNING:tensorflow:From C:\Users\user\Anaconda3\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
InĀ [9]:
classifier = tf.keras.layers.Dense(256, activation = 'relu')(feature_extractor.output)
classifier = tf.keras.layers.Dense(256, activation = 'relu')(classifier)
classifier = tf.keras.layers.Dense(5, activation = 'softmax', name = 'cls')(classifier)
InĀ [10]:
bb_regressor = tf.keras.layers.Dense(256, activation = 'relu')(feature_extractor.output)
bb_regressor = tf.keras.layers.Dense(256, activation = 'relu')(bb_regressor)
bb_regressor = tf.keras.layers.Dense(4, name = 'bbox')(bb_regressor)
InĀ [11]:
object_detection = tf.keras.models.Model(inputs = feature_extractor.input, 
                                         outputs = [classifier, bb_regressor])
InĀ [12]:
object_detection.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
conv2d_input (InputLayer)       [(None, 240, 320, 3) 0                                            
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 240, 320, 32) 896         conv2d_input[0][0]               
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D)    (None, 120, 160, 32) 0           conv2d[0][0]                     
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 120, 160, 64) 18496       max_pooling2d[0][0]              
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 60, 80, 64)   0           conv2d_1[0][0]                   
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 60, 80, 64)   36928       max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 30, 40, 64)   0           conv2d_2[0][0]                   
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 30, 40, 128)  73856       max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 15, 20, 128)  0           conv2d_3[0][0]                   
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 15, 20, 128)  147584      max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 7, 10, 128)   0           conv2d_4[0][0]                   
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 7, 10, 256)   295168      max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
global_average_pooling2d (Globa (None, 256)          0           conv2d_5[0][0]                   
__________________________________________________________________________________________________
dense (Dense)                   (None, 256)          65792       global_average_pooling2d[0][0]   
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 256)          65792       global_average_pooling2d[0][0]   
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 256)          65792       dense[0][0]                      
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 256)          65792       dense_2[0][0]                    
__________________________________________________________________________________________________
cls (Dense)                     (None, 5)            1285        dense_1[0][0]                    
__________________________________________________________________________________________________
bbox (Dense)                    (None, 4)            1028        dense_3[0][0]                    
==================================================================================================
Total params: 838,409
Trainable params: 838,409
Non-trainable params: 0
__________________________________________________________________________________________________

5.4. Define Losses and Optimization ConfigurationĀ¶

InĀ [13]:
object_detection.compile(optimizer = 'adam', 
                         loss = {'cls': 'sparse_categorical_crossentropy', 
                                 'bbox': 'mse'}, 
                         loss_weights = {'cls': 1, 
                                         'bbox': 1})
InĀ [14]:
# divide labels to cls and bbox labels

train_cls = train_labels[:,:1]
train_bbox = train_labels[:,1:]

print(train_labels.shape)
print(train_cls.shape)
print(train_bbox.shape)
(240, 5)
(240, 1)
(240, 4)
InĀ [15]:
object_detection.fit(x = train_imgs, 
                     y = {'cls': train_cls, 'bbox': train_bbox}, 
                     epochs = 100)
Train on 240 samples
Epoch 1/100
240/240 [==============================] - 5s 22ms/sample - loss: 5681.9053 - cls_loss: 2.1066 - bbox_loss: 5478.7485
Epoch 2/100
240/240 [==============================] - 2s 7ms/sample - loss: 2519.5910 - cls_loss: 3.1534 - bbox_loss: 2484.2515
Epoch 3/100
240/240 [==============================] - 2s 7ms/sample - loss: 1912.8109 - cls_loss: 2.1321 - bbox_loss: 1968.1179
Epoch 4/100
240/240 [==============================] - 2s 7ms/sample - loss: 1603.1883 - cls_loss: 1.7777 - bbox_loss: 1615.7990
Epoch 5/100
240/240 [==============================] - 2s 7ms/sample - loss: 1569.6289 - cls_loss: 1.6685 - bbox_loss: 1542.0385
Epoch 6/100
240/240 [==============================] - 2s 7ms/sample - loss: 1553.8370 - cls_loss: 1.6862 - bbox_loss: 1549.2543
Epoch 7/100
240/240 [==============================] - 2s 7ms/sample - loss: 1573.0939 - cls_loss: 1.6687 - bbox_loss: 1558.2192
Epoch 8/100
240/240 [==============================] - 2s 7ms/sample - loss: 1576.4488 - cls_loss: 1.6931 - bbox_loss: 1503.7844
Epoch 9/100
240/240 [==============================] - 2s 7ms/sample - loss: 1673.4876 - cls_loss: 1.6903 - bbox_loss: 1611.8004
Epoch 10/100
240/240 [==============================] - 2s 7ms/sample - loss: 1687.8713 - cls_loss: 1.7049 - bbox_loss: 1643.6080
Epoch 11/100
240/240 [==============================] - 2s 7ms/sample - loss: 1642.9322 - cls_loss: 1.6776 - bbox_loss: 1623.3585
Epoch 12/100
240/240 [==============================] - 2s 7ms/sample - loss: 1538.1259 - cls_loss: 1.8715 - bbox_loss: 1523.6127
Epoch 13/100
240/240 [==============================] - 2s 7ms/sample - loss: 1472.6313 - cls_loss: 1.8658 - bbox_loss: 1521.6079
Epoch 14/100
240/240 [==============================] - 2s 7ms/sample - loss: 1458.4132 - cls_loss: 1.7374 - bbox_loss: 1456.1985
Epoch 15/100
240/240 [==============================] - 2s 7ms/sample - loss: 1466.0161 - cls_loss: 1.8461 - bbox_loss: 1454.2386
Epoch 16/100
240/240 [==============================] - 2s 7ms/sample - loss: 1383.2426 - cls_loss: 1.8098 - bbox_loss: 1362.2913
Epoch 17/100
240/240 [==============================] - 2s 7ms/sample - loss: 1336.6668 - cls_loss: 1.6955 - bbox_loss: 1319.4119
Epoch 18/100
240/240 [==============================] - 2s 7ms/sample - loss: 1310.7240 - cls_loss: 1.7181 - bbox_loss: 1341.3883
Epoch 19/100
240/240 [==============================] - 2s 7ms/sample - loss: 1287.0337 - cls_loss: 1.7373 - bbox_loss: 1249.7339
Epoch 20/100
240/240 [==============================] - 2s 7ms/sample - loss: 1226.5164 - cls_loss: 1.6869 - bbox_loss: 1236.3241
Epoch 21/100
240/240 [==============================] - 2s 7ms/sample - loss: 1161.5149 - cls_loss: 1.6082 - bbox_loss: 1141.3182
Epoch 22/100
240/240 [==============================] - 2s 7ms/sample - loss: 1074.1955 - cls_loss: 1.6012 - bbox_loss: 1051.2124
Epoch 23/100
240/240 [==============================] - 2s 9ms/sample - loss: 931.8647 - cls_loss: 1.5887 - bbox_loss: 922.6182
Epoch 24/100
240/240 [==============================] - 2s 7ms/sample - loss: 712.6288 - cls_loss: 1.6184 - bbox_loss: 719.2822
Epoch 25/100
240/240 [==============================] - 2s 7ms/sample - loss: 438.5539 - cls_loss: 1.6662 - bbox_loss: 439.8396
Epoch 26/100
240/240 [==============================] - 2s 7ms/sample - loss: 379.9687 - cls_loss: 1.6578 - bbox_loss: 373.2707
Epoch 27/100
240/240 [==============================] - 2s 7ms/sample - loss: 360.7031 - cls_loss: 1.6198 - bbox_loss: 355.4231
Epoch 28/100
240/240 [==============================] - 2s 7ms/sample - loss: 369.6588 - cls_loss: 1.5353 - bbox_loss: 365.2220
Epoch 29/100
240/240 [==============================] - 2s 7ms/sample - loss: 355.7213 - cls_loss: 1.5603 - bbox_loss: 352.9035
Epoch 30/100
240/240 [==============================] - 2s 7ms/sample - loss: 344.3340 - cls_loss: 1.5451 - bbox_loss: 339.7810
Epoch 31/100
240/240 [==============================] - 2s 7ms/sample - loss: 345.8651 - cls_loss: 1.5034 - bbox_loss: 335.5944
Epoch 32/100
240/240 [==============================] - 2s 7ms/sample - loss: 315.3388 - cls_loss: 1.4201 - bbox_loss: 312.2047
Epoch 33/100
240/240 [==============================] - 2s 7ms/sample - loss: 342.3046 - cls_loss: 1.4377 - bbox_loss: 346.1867
Epoch 34/100
240/240 [==============================] - 2s 7ms/sample - loss: 366.2266 - cls_loss: 1.3021 - bbox_loss: 361.7388
Epoch 35/100
240/240 [==============================] - 2s 7ms/sample - loss: 313.2007 - cls_loss: 1.2609 - bbox_loss: 307.6383
Epoch 36/100
240/240 [==============================] - 2s 7ms/sample - loss: 332.3002 - cls_loss: 1.1343 - bbox_loss: 337.3712
Epoch 37/100
240/240 [==============================] - 2s 7ms/sample - loss: 309.7456 - cls_loss: 1.1239 - bbox_loss: 308.2162
Epoch 38/100
240/240 [==============================] - 2s 7ms/sample - loss: 279.3494 - cls_loss: 1.0364 - bbox_loss: 275.0319
Epoch 39/100
240/240 [==============================] - 2s 7ms/sample - loss: 267.5514 - cls_loss: 0.9872 - bbox_loss: 264.2712
Epoch 40/100
240/240 [==============================] - 2s 7ms/sample - loss: 267.8042 - cls_loss: 0.9695 - bbox_loss: 268.5967
Epoch 41/100
240/240 [==============================] - 2s 7ms/sample - loss: 261.5930 - cls_loss: 0.9303 - bbox_loss: 263.9668
Epoch 42/100
240/240 [==============================] - 2s 7ms/sample - loss: 268.0326 - cls_loss: 0.9869 - bbox_loss: 273.0218
Epoch 43/100
240/240 [==============================] - 2s 7ms/sample - loss: 279.4425 - cls_loss: 0.9973 - bbox_loss: 282.1924
Epoch 44/100
240/240 [==============================] - 2s 7ms/sample - loss: 274.8469 - cls_loss: 0.9333 - bbox_loss: 268.3581
Epoch 45/100
240/240 [==============================] - 2s 7ms/sample - loss: 253.7743 - cls_loss: 0.8289 - bbox_loss: 251.4204
Epoch 46/100
240/240 [==============================] - 2s 7ms/sample - loss: 254.4157 - cls_loss: 0.8152 - bbox_loss: 254.9213
Epoch 47/100
240/240 [==============================] - 2s 7ms/sample - loss: 259.5771 - cls_loss: 0.8273 - bbox_loss: 261.1655
Epoch 48/100
240/240 [==============================] - 2s 7ms/sample - loss: 285.6501 - cls_loss: 0.9176 - bbox_loss: 281.0610
Epoch 49/100
240/240 [==============================] - 2s 7ms/sample - loss: 287.9973 - cls_loss: 0.7932 - bbox_loss: 283.7470
Epoch 50/100
240/240 [==============================] - 2s 7ms/sample - loss: 290.1907 - cls_loss: 0.8772 - bbox_loss: 285.2358
Epoch 51/100
240/240 [==============================] - 2s 7ms/sample - loss: 279.6186 - cls_loss: 0.8930 - bbox_loss: 274.7966
Epoch 52/100
240/240 [==============================] - 2s 7ms/sample - loss: 236.0780 - cls_loss: 0.8321 - bbox_loss: 231.8707
Epoch 53/100
240/240 [==============================] - 2s 7ms/sample - loss: 230.5969 - cls_loss: 0.7158 - bbox_loss: 230.7700
Epoch 54/100
240/240 [==============================] - 2s 7ms/sample - loss: 225.5499 - cls_loss: 0.6770 - bbox_loss: 224.0228
Epoch 55/100
240/240 [==============================] - 2s 7ms/sample - loss: 223.6896 - cls_loss: 0.7898 - bbox_loss: 221.3662
Epoch 56/100
240/240 [==============================] - 2s 7ms/sample - loss: 227.9123 - cls_loss: 0.7208 - bbox_loss: 227.8507
Epoch 57/100
240/240 [==============================] - 2s 9ms/sample - loss: 231.7455 - cls_loss: 0.7491 - bbox_loss: 239.7567
Epoch 58/100
240/240 [==============================] - 2s 8ms/sample - loss: 246.2499 - cls_loss: 0.6987 - bbox_loss: 241.5783
Epoch 59/100
240/240 [==============================] - 2s 7ms/sample - loss: 214.3431 - cls_loss: 0.5759 - bbox_loss: 214.6638
Epoch 60/100
240/240 [==============================] - 2s 7ms/sample - loss: 217.3363 - cls_loss: 0.5928 - bbox_loss: 215.8679
Epoch 61/100
240/240 [==============================] - 2s 7ms/sample - loss: 213.7671 - cls_loss: 0.6726 - bbox_loss: 212.3249
Epoch 62/100
240/240 [==============================] - 2s 7ms/sample - loss: 216.2981 - cls_loss: 0.6824 - bbox_loss: 218.8326
Epoch 63/100
240/240 [==============================] - 2s 7ms/sample - loss: 193.7031 - cls_loss: 0.5550 - bbox_loss: 195.2289
Epoch 64/100
240/240 [==============================] - 2s 7ms/sample - loss: 189.3222 - cls_loss: 0.5815 - bbox_loss: 186.4016
Epoch 65/100
240/240 [==============================] - 2s 7ms/sample - loss: 193.4746 - cls_loss: 0.4889 - bbox_loss: 194.9882
Epoch 66/100
240/240 [==============================] - 2s 7ms/sample - loss: 175.7661 - cls_loss: 0.4673 - bbox_loss: 178.4992
Epoch 67/100
240/240 [==============================] - 2s 7ms/sample - loss: 173.5051 - cls_loss: 0.4566 - bbox_loss: 172.7797
Epoch 68/100
240/240 [==============================] - 2s 7ms/sample - loss: 169.0080 - cls_loss: 0.4137 - bbox_loss: 168.3112
Epoch 69/100
240/240 [==============================] - 2s 7ms/sample - loss: 164.3058 - cls_loss: 0.4024 - bbox_loss: 162.8996
Epoch 70/100
240/240 [==============================] - 2s 7ms/sample - loss: 177.2798 - cls_loss: 0.4096 - bbox_loss: 178.4565
Epoch 71/100
240/240 [==============================] - 2s 7ms/sample - loss: 156.9721 - cls_loss: 0.4963 - bbox_loss: 154.5094
Epoch 72/100
240/240 [==============================] - 2s 7ms/sample - loss: 174.0985 - cls_loss: 0.5727 - bbox_loss: 168.3836
Epoch 73/100
240/240 [==============================] - 2s 7ms/sample - loss: 154.8273 - cls_loss: 0.5213 - bbox_loss: 149.5416
Epoch 74/100
240/240 [==============================] - 2s 7ms/sample - loss: 157.6376 - cls_loss: 0.4879 - bbox_loss: 162.2568
Epoch 75/100
240/240 [==============================] - 2s 7ms/sample - loss: 144.1187 - cls_loss: 0.4194 - bbox_loss: 149.5612
Epoch 76/100
240/240 [==============================] - 2s 7ms/sample - loss: 142.7248 - cls_loss: 0.3394 - bbox_loss: 141.7545
Epoch 77/100
240/240 [==============================] - 2s 7ms/sample - loss: 135.5648 - cls_loss: 0.3134 - bbox_loss: 138.6658
Epoch 78/100
240/240 [==============================] - 2s 7ms/sample - loss: 133.0346 - cls_loss: 0.2892 - bbox_loss: 131.8698
Epoch 79/100
240/240 [==============================] - 2s 7ms/sample - loss: 122.7888 - cls_loss: 0.2636 - bbox_loss: 131.9355
Epoch 80/100
240/240 [==============================] - 2s 7ms/sample - loss: 120.4464 - cls_loss: 0.2841 - bbox_loss: 118.5327
Epoch 81/100
240/240 [==============================] - 2s 7ms/sample - loss: 117.4342 - cls_loss: 0.2874 - bbox_loss: 118.6687
Epoch 82/100
240/240 [==============================] - 2s 8ms/sample - loss: 116.6355 - cls_loss: 0.2136 - bbox_loss: 112.5214
Epoch 83/100
240/240 [==============================] - 2s 7ms/sample - loss: 120.1364 - cls_loss: 0.2401 - bbox_loss: 117.7729
Epoch 84/100
240/240 [==============================] - 2s 7ms/sample - loss: 121.4577 - cls_loss: 0.2207 - bbox_loss: 122.1717
Epoch 85/100
240/240 [==============================] - 2s 7ms/sample - loss: 108.9384 - cls_loss: 0.2650 - bbox_loss: 106.7923
Epoch 86/100
240/240 [==============================] - 2s 8ms/sample - loss: 100.1374 - cls_loss: 0.3263 - bbox_loss: 99.6020
Epoch 87/100
240/240 [==============================] - 2s 7ms/sample - loss: 102.0943 - cls_loss: 0.2177 - bbox_loss: 99.8301
Epoch 88/100
240/240 [==============================] - 2s 7ms/sample - loss: 104.8665 - cls_loss: 0.1854 - bbox_loss: 107.3786
Epoch 89/100
240/240 [==============================] - 2s 7ms/sample - loss: 104.8277 - cls_loss: 0.1844 - bbox_loss: 102.3633
Epoch 90/100
240/240 [==============================] - 2s 7ms/sample - loss: 98.9953 - cls_loss: 0.1644 - bbox_loss: 95.7922
Epoch 91/100
240/240 [==============================] - 2s 8ms/sample - loss: 104.4962 - cls_loss: 0.1471 - bbox_loss: 102.4389
Epoch 92/100
240/240 [==============================] - 2s 7ms/sample - loss: 110.0887 - cls_loss: 0.1988 - bbox_loss: 109.5863
Epoch 93/100
240/240 [==============================] - 2s 7ms/sample - loss: 96.4573 - cls_loss: 0.1555 - bbox_loss: 99.0975
Epoch 94/100
240/240 [==============================] - 2s 7ms/sample - loss: 95.9991 - cls_loss: 0.1333 - bbox_loss: 98.3554
Epoch 95/100
240/240 [==============================] - 2s 7ms/sample - loss: 105.1611 - cls_loss: 0.1262 - bbox_loss: 107.4358
Epoch 96/100
240/240 [==============================] - 2s 7ms/sample - loss: 95.0764 - cls_loss: 0.1138 - bbox_loss: 90.5917
Epoch 97/100
240/240 [==============================] - 2s 7ms/sample - loss: 86.4985 - cls_loss: 0.0973 - bbox_loss: 87.0234
Epoch 98/100
240/240 [==============================] - 2s 7ms/sample - loss: 100.9962 - cls_loss: 0.0940 - bbox_loss: 99.5692
Epoch 99/100
240/240 [==============================] - 2s 7ms/sample - loss: 105.1858 - cls_loss: 0.0796 - bbox_loss: 102.3532
Epoch 100/100
240/240 [==============================] - 2s 7ms/sample - loss: 90.0616 - cls_loss: 0.0890 - bbox_loss: 89.4728
Out[15]:
<tensorflow.python.keras.callbacks.History at 0x2a835a95a88>

5.5. Check Training ResultsĀ¶

InĀ [16]:
idx = 110

# true label
c_label, x_label, y_label, h_label, w_label = train_labels[idx]

rect_label = patches.Rectangle((x_label, y_label),
                               w_label,
                               h_label,
                               linewidth = 2,
                               edgecolor = 'r',
                               facecolor = 'none')

# predict
c_pred, bbox = object_detection.predict(train_imgs[[idx]])

x, y, h, w = bbox[0]
rect = patches.Rectangle((x, y),
                         w,
                         h,
                         linewidth = 2,
                         edgecolor = 'b',
                         facecolor = 'none')
InĀ [17]:
print(classes[int(c_label)])
print(classes[np.argmax(c_pred)])
Bearing
Bearing
InĀ [18]:
fig, ax = plt.subplots(figsize = (8,8))
plt.imshow(train_imgs[idx])
ax.add_patch(rect_label)
ax.add_patch(rect)
plt.axis('off')
plt.show()

5.6. Check Evaluation ResultsĀ¶

InĀ [19]:
idx = 50

# true label
c_label, x_label, y_label, h_label, w_label = test_labels[idx]

rect_label = patches.Rectangle((x_label, y_label),
                               w_label,
                               h_label,
                               linewidth = 2,
                               edgecolor = 'r',
                               facecolor = 'none')

# predict
c_pred, bbox = object_detection.predict(test_imgs[[idx]])

x, y, h, w = bbox[0]
rect = patches.Rectangle((x, y),
                         w,
                         h,
                         linewidth = 2,
                         edgecolor = 'b',
                         facecolor = 'none')
InĀ [20]:
print(classes[int(c_label)])
print(classes[np.argmax(c_pred)])
Distance_Tube
Distance_Tube
InĀ [21]:
fig, ax = plt.subplots(figsize = (8,8))
plt.imshow(test_imgs[idx])
ax.add_patch(rect_label)
ax.add_patch(rect)
plt.axis('off')
plt.show()
InĀ [22]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')