Deep Learning for Mechanical Engineering

Homework 10

Due Wed., 11/29/2023, 4:00 PM

Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

For your handwritten solutions, please scan or take a picture of them. Alternatively, you can write them in markdown if you prefer.
Only .ipynb files will be graded for your code.
- Ensure that your NAME and student ID are included in your .ipynb files. ex) IljeokKim_20202467_HW08.ipynb
Compress all the files into a single .zip file.
- In the .zip file's name, include your NAME and student ID. ex) DogyeomPark_20202467_HW08.zip
- Submit this .zip file on KLMS
Do not submit a printed version of your code, as it will not be graded.

Problem 1: LSTM with TensorFlow¶

In this problem, you will make a LSTM model to predict the half of an MNIST image using the other half.
You will split an MNIST image into 28 pieces.
MNIST is 28 x 28 image. The model predicts a piece of 1 x 28 image.
First, 14 x 28 image will be feeded into a model as time series, then the model predict the last 14 x 28 image, recursively.

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

(1) Load MNIST Data

Download MNIST data from the tensorflow tutorial example

(train_imgs, train_labels), (test_imgs, test_labels) = tf.keras.datasets.mnist.load_data()

train_imgs = train_imgs/255.0
test_imgs = test_imgs/255.0

print('train_x: ', train_imgs.shape)
print('test_x: ', test_imgs.shape)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step
train_x:  (60000, 28, 28)
test_x:  (10000, 28, 28)

(2) Plot a ramdomly selected data with its label

def batchmaker(x, y, n_batch):
    idx = np.random.randint(len(x), size = n_batch)
    return x[idx], y[idx]

train_x, train_y = batchmaker(train_imgs, train_labels, 1)
img = train_x[0].reshape(28, 28)

plt.figure(figsize = (5, 3))
plt.imshow(img,'gray')
plt.title("Label : {}".format(train_y[0]))
plt.xticks([])
plt.yticks([])
plt.show()

(3) Define LSTM Structure

n_step = 14
n_input = 28

## LSTM shape
n_lstm1 = 10
n_lstm2 = 10

## Fully connected
n_hidden = 100
n_output = 28

lstm_network = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape = (n_step, n_input)),
    tf.keras.layers.LSTM(n_lstm1, return_sequences = True),
    tf.keras.layers.LSTM(n_lstm2),
    tf.keras.layers.Dense(n_hidden, activation = 'relu'),
    tf.keras.layers.Dense(n_output),
])

lstm_network.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm (LSTM)                 (None, 14, 10)            1560      
                                                                 
 lstm_1 (LSTM)               (None, 10)                840       
                                                                 
 dense (Dense)               (None, 100)               1100      
                                                                 
 dense_1 (Dense)             (None, 28)                2828      
                                                                 
=================================================================
Total params: 6,328
Trainable params: 6,328
Non-trainable params: 0
_________________________________________________________________

(4) Define Cost, Initializer and Optimizer Loss

Regression: Squared loss

$$ \frac{1}{N} \sum_{i=1}^{N} (\hat{y}^{(i)} - y^{(i)})^2$$

Initializer

Initialize all the empty variables

Optimizer

AdamOptimizer: the most popular optimizer

lstm_network.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = 0.0005),
                     loss = 'mean_squared_error')

(5) Define optimization configuration and then optimize

n_iter = 2000
n_prt = 200
batch_size = 10

for i in range(n_iter+1):
    train_x, train_y = batchmaker(train_imgs, train_labels, batch_size)

    for j in range(n_step):
        loss = lstm_network.train_on_batch(train_x[:, j:j+n_step, :], train_x[:, j+n_step, :])

    if i % n_prt == 0:
        train_loss = lstm_network.test_on_batch(train_x[:, 13:13+n_step, :], train_x[:, 13+n_step, :])
        print(train_loss)

0.0014543014112859964
0.0032421338837593794
0.008472677320241928
0.00035746616777032614
0.001835529925301671
0.010492042638361454
0.0002733171568252146
0.0002736546448431909
0.0003306472790427506
0.00020183045126032084
0.00015898968558758497

(6) Test or Evaluate

Predict the MNIST image
MNIST is 28 x 28 image. The model predicts a piece of 1 x 28 image.
First, 14 x 28 image will be feeded into a model, then the model predict the last 14 x 28 image, recursively.

test_x, test_y = batchmaker(test_imgs, test_labels, 5)

for idx in range(5):
    gen_img = []

    sample = test_x[idx, 0:14, :]
    input_img = sample.copy()

    feeding_img = test_x[idx, 0:0+n_step, :]

    for i in range(n_step):
        test_pred = lstm_network.predict(feeding_img.reshape(1, 14, 28), verbose = 0)
        feeding_img = np.delete(feeding_img, 0, 0)
        feeding_img = np.vstack([feeding_img, test_pred])
        gen_img.append(test_pred)

    for i in range(n_step):
        sample = np.vstack([sample, gen_img[i]])

    plt.figure(figsize = (12, 3))
    plt.subplot(1,3,1)
    plt.imshow(test_x[idx], 'gray')
    plt.title('Original Img')
    plt.xticks([])
    plt.yticks([])

    plt.subplot(1,3,2)
    plt.imshow(input_img, 'gray')
    plt.title('Input')
    plt.xticks([])
    plt.yticks([])

    plt.subplot(1,3,3)
    plt.imshow(sample, 'gray')
    plt.title('Generated Img')
    plt.xticks([])
    plt.yticks([])
    plt.show()

Problem 2¶

In this problem, we have bearing data with 3 classes (healthy, inner fault, outer fault).
The objective is to classify the given data using deep learning models.

Dataset Description

The bearing data is collected by a sensory system which has 2 channels: vibration and rotational speed. You can refer to the paper to see the specification in detail. The experimental setup is shown in the below figure. The dataset contains 36 files with 3 classes, 2 sensor positions, and 4 speed varying conditions. Every data is sampled at 200,000 Hz of sampling frequency and 10 seconds of duration. We will use only the increasing speed condition and the channel 1 (vibration data) for the sake of simplicity.

Download & Load Data

We already made the data ready for you. You can download the data in .npy format. Three files are prepared for a train set and three files for a test set.

(1) Plot all data

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive

Healthy_train = np.load('/content/drive/MyDrive/DL_Colab/DL_data/Bearing_Healthy_train.npy')
InnerFault_train = np.load('/content/drive/MyDrive/DL_Colab/DL_data/Bearing_InnerFault_train.npy')
OuterFault_train = np.load('/content/drive/MyDrive/DL_Colab/DL_data/Bearing_OuterFault_train.npy')

Healthy_test = np.load('/content/drive/MyDrive/DL_Colab/DL_data/Bearing_Healthy_test.npy')
InnerFault_test = np.load('/content/drive/MyDrive/DL_Colab/DL_data/Bearing_InnerFault_test.npy')
OuterFault_test = np.load('/content/drive/MyDrive/DL_Colab/DL_data/Bearing_OuterFault_test.npy')

idx = np.random.choice(Healthy_train.shape[0])
tindx = np.linspace(0, Healthy_train[idx].shape[0]/2000, Healthy_train[idx].shape[0])

plt.figure(figsize = (10, 9))

plt.subplot(3,1,1);  plt.plot(tindx, Healthy_train[idx]);  plt.ylim([-1, 1]);  plt.grid('on', alpha = 0.2)
plt.subplot(3,1,2);  plt.plot(tindx, InnerFault_train[idx]);  plt.ylim([-1, 1]);  plt.grid('on', alpha = 0.2)
plt.subplot(3,1,3);  plt.plot(tindx, OuterFault_train[idx]);  plt.ylim([-1, 1]);  plt.grid('on', alpha = 0.2)

plt.show()

idx = np.random.choice(Healthy_test.shape[0])

plt.figure(figsize = (10, 9))

plt.subplot(3,1,1);  plt.plot(tindx, Healthy_test[idx]);  plt.ylim([-1, 1]);  plt.grid('on', alpha = 0.2)
plt.subplot(3,1,2);  plt.plot(tindx, InnerFault_test[idx]);  plt.ylim([-1, 1]);  plt.grid('on', alpha = 0.2)
plt.subplot(3,1,3);  plt.plot(tindx, OuterFault_test[idx]);  plt.ylim([-1, 1]);  plt.grid('on', alpha = 0.2)

plt.show()

In the following subproblems, you will be introduced deep learning models which are used for time-series data classification. You are asked to build the model with the given architecture and evaluate the performance.

Various deep learning models for time-series data classification

In the deep learning field, there are plenty of models for time-series data classification. Compared to conventional machine learning algorithms, recent deep neural network models show much higher accuracy as you can see in the table below. In this problem, we will build the CNN-LSTM model whose elements are covered in this course.

"Deep Learning Algorithms for Bearing Fault Diagnostics – A Comprehensive Review", 2020, S Zhang et al.

Implementation of CNN-LSTM model

The CNN-LSTM model is introduced in "An Improved Bearing Fault Diagnosis Method using One-Dimensional CNN and LSTM." The authors combined 1D CNN and LSTM succesfully and it shows high performance in terms of both computation time and accuracy. The following configuration shows a part of the structure of the model. The model takes the segmented data as an input. A data segment which has a length of 2,000 is randomly cropped from the original time-series data.

(2) Create trainset and testset

Create trainset and testset to feed the model you designed. Set the number of segments equals to 4,000 for trainset and 1,000 for testset (for each class).

train_x = np.vstack((Healthy_train, InnerFault_train, OuterFault_train))

train_y = np.concatenate((np.zeros(Healthy_train.shape[0]),
                          np.ones(InnerFault_train.shape[0]),
                          2*np.ones(OuterFault_train.shape[0])))

test_x = np.vstack((Healthy_test, InnerFault_test, OuterFault_test))
test_y = np.concatenate((np.zeros(Healthy_test.shape[0]),
                         np.ones(InnerFault_test.shape[0]),
                         2*np.ones(OuterFault_test.shape[0])))

(3) Build the model based on the above information and print the structure.

Build the model based on the above information and print the structure. You can freely assign the other parameters that are not described in the above configuration in order to achieve better performance. (You don't need to refer the original paper for those.) Please refer to the summary of the model structure.

You will use the 1D convolution layer for this problem. We have learned 2D convolution in class and 1D convolution is nothing but 2D convolution with a height as 1.

Input_shape of data = (4000, 2000, 1)

2000 $\rightarrow$ 20 $\times$ 100 $\rightarrow$ CNN layers $\rightarrow$ LSTM layer

cnn_lstm_model = tf.keras.models.Sequential([
    tf.keras.layers.Reshape((20, 100),
                            input_shape = (2000,)),
    tf.keras.layers.Conv1D(filters = 32,
                           kernel_size = 64,
                           activation = 'relu',
                           padding = 'SAME'),
    tf.keras.layers.MaxPool1D(pool_size = 2, strides = 2),
    tf.keras.layers.LSTM(128),
    tf.keras.layers.Dense(3, activation = 'softmax')
])

cnn_lstm_model.compile(optimizer = tf.keras.optimizers.Adam(0.004),
                       loss = 'sparse_categorical_crossentropy',
                       metrics = ['accuracy'])

cnn_lstm_model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 reshape (Reshape)           (None, 20, 100)           0         
                                                                 
 conv1d (Conv1D)             (None, 20, 32)            204832    
                                                                 
 max_pooling1d (MaxPooling1D  (None, 10, 32)           0         
 )                                                               
                                                                 
 lstm_2 (LSTM)               (None, 128)               82432     
                                                                 
 dense_2 (Dense)             (None, 3)                 387       
                                                                 
=================================================================
Total params: 287,651
Trainable params: 287,651
Non-trainable params: 0
_________________________________________________________________

(4) Train the model and print the training procedure.

cnn_lstm_train_record = cnn_lstm_model.fit(train_x, train_y, epochs = 10, batch_size = 50)

Epoch 1/10
240/240 [==============================] - 22s 81ms/step - loss: 0.2283 - accuracy: 0.8917
Epoch 2/10
240/240 [==============================] - 19s 81ms/step - loss: 0.0329 - accuracy: 0.9877
Epoch 3/10
240/240 [==============================] - 19s 81ms/step - loss: 0.0155 - accuracy: 0.9947
Epoch 4/10
240/240 [==============================] - 17s 72ms/step - loss: 0.0209 - accuracy: 0.9927
Epoch 5/10
240/240 [==============================] - 18s 74ms/step - loss: 0.0041 - accuracy: 0.9988
Epoch 6/10
240/240 [==============================] - 19s 78ms/step - loss: 0.0094 - accuracy: 0.9970
Epoch 7/10
240/240 [==============================] - 20s 82ms/step - loss: 0.0081 - accuracy: 0.9978
Epoch 8/10
240/240 [==============================] - 19s 81ms/step - loss: 0.0114 - accuracy: 0.9957
Epoch 9/10
240/240 [==============================] - 19s 80ms/step - loss: 0.0190 - accuracy: 0.9940
Epoch 10/10
240/240 [==============================] - 18s 74ms/step - loss: 0.0190 - accuracy: 0.9936

(5) Evaluate the model in terms of accuracy of the testset.

cnn_lstm_model.evaluate(test_x, test_y, batch_size = 50)

60/60 [==============================] - 3s 31ms/step - loss: 0.0060 - accuracy: 0.9973

[0.006006747018545866, 0.9973333477973938]

(6) Build new model without 1DCNN, train it with the data you used for CNN+LSTM model, and test it with the test dataset used previously

lstm_model = tf.keras.Sequential([
    tf.keras.layers.Reshape((20, 100), input_shape = (2000,)),
    tf.keras.layers.LSTM(128, return_sequences = True),
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dense(3, activation = 'softmax'),
])

lstm_model.compile(optimizer = tf.keras.optimizers.Adam(0.004),
                   loss = 'sparse_categorical_crossentropy',
                   metrics = ['accuracy'])

lstm_model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 reshape_1 (Reshape)         (None, 20, 100)           0         
                                                                 
 lstm_3 (LSTM)               (None, 20, 128)           117248    
                                                                 
 lstm_4 (LSTM)               (None, 32)                20608     
                                                                 
 dense_3 (Dense)             (None, 3)                 99        
                                                                 
=================================================================
Total params: 137,955
Trainable params: 137,955
Non-trainable params: 0
_________________________________________________________________

lstm_train_record = lstm_model.fit(train_x, train_y, epochs = 10, batch_size = 50)

Epoch 1/10
240/240 [==============================] - 22s 62ms/step - loss: 0.5576 - accuracy: 0.6979
Epoch 2/10
240/240 [==============================] - 29s 120ms/step - loss: 0.3143 - accuracy: 0.8520
Epoch 3/10
240/240 [==============================] - 15s 62ms/step - loss: 0.0958 - accuracy: 0.9646
Epoch 4/10
240/240 [==============================] - 19s 78ms/step - loss: 0.0368 - accuracy: 0.9864
Epoch 5/10
240/240 [==============================] - 19s 80ms/step - loss: 0.0156 - accuracy: 0.9948
Epoch 6/10
240/240 [==============================] - 17s 73ms/step - loss: 0.0200 - accuracy: 0.9933
Epoch 7/10
240/240 [==============================] - 15s 64ms/step - loss: 0.0084 - accuracy: 0.9967
Epoch 8/10
240/240 [==============================] - 18s 76ms/step - loss: 0.0049 - accuracy: 0.9984
Epoch 9/10
240/240 [==============================] - 27s 112ms/step - loss: 4.4060e-04 - accuracy: 0.9999
Epoch 10/10
240/240 [==============================] - 20s 82ms/step - loss: 1.6579e-04 - accuracy: 1.0000

lstm_model.evaluate(test_x, test_y, batch_size = 50)

60/60 [==============================] - 4s 43ms/step - loss: 0.0054 - accuracy: 0.9983

[0.005438053049147129, 0.9983333349227905]

(7) Compare loss and accuracy curves of two models (one with LSTM only and the other with CNN+LSTM)

Note that two models do not show big difference in terms of the accuarcy due to the simplicity of dataset.

plt.figure(figsize = (6, 4))

plt.plot(cnn_lstm_train_record.history['loss'], label = 'CNN+LSTM Loss')
plt.plot(lstm_train_record.history['loss'], label = 'LSTM Loss')
plt.title('Comparison of Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(loc = 'upper right')
plt.show()

plt.figure(figsize = (6, 4))

plt.plot(cnn_lstm_train_record.history['accuracy'], label = 'CNN+LSTM Accuracy')
plt.plot(lstm_train_record.history['accuracy'], label = 'LSTM Accuracy')
plt.title('Comparison of Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(loc = 'lower right')
plt.show()