Machine Learning for Mechanical Engineering

ANN

Instructor: Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Problem 1

1) Explain why the perceptron cannot solve the XOR problem.

2) ANN (Artificial Neural Networks) is also called as MLP (Multilayer Perceptron). Explain why the MLP is able to solve the XOR problem.

Problem 2

2) (Choose correct answers) Jonathan has now switched to multilayer neural networks and notices that the training error is going down and converges to a local minimum. Then when he tests on the new data, the test error is abnormally high. What is probably going wrong and what do you recommend him to do? (3 choices)

$\quad$ a.The training data size is not large enough. Collect a larger training data and retrain it.

$\quad$ b. Play with learning rate and add regularization term to the objective function.

$\quad$ c. Use a different initialization and train the network several times. Use the average of predictions from all nets to predict test data.

$\quad$ d. Use the same training data but add two more hidden layers.

Problem 3

1) True or false for the following questions. (Correct +1, Wrong -1)

$\quad$ d. Single perceptron can solve a lineary inseperable problem with a kernel function.

$\quad$ e. Gradient descent trains neural networks to the global optimum.

2) (Choose all the correct answers) Jonathan is tyring to solve the XOR problem using a multilayer perceptron (MLP) with ReLU activation function. However, as he trains the MLP model, the results vary at every iteration. The results are correct in some iterations, and the results are wrong at the other iterations. What is probably going wrong and what do you recommend him to do?

$\quad$ a. The training data points are not large enough. Collect a larger training data points and re-train it.

$\quad$ b. The number of perceptron layers is too large. Remove the perceptron layers and re-train it.

$\quad$ c. The number of perceptron layers is too small. Add more the perceptron layers and re-train it.

$\quad$ d. Learning rate is too high. Reduce learning rate and re-train it.

3) Explain the difference between sigmoid (or hyperbolic tangent), and rectified linear unit (ReLU) activation functions in gradient backpropagation.

Problem 4

To deal with nonlinearly distributed data set, we need to design an appropriate kernel to make (or map) the original data to be linearly seperable. However, in artificial neural networks, this step is not necessary. Discuss why. (Hint: see the following figure.)


Problem 5

To train neural networks, backpropagation is used. Briefly explain what the backpropagation is. When you discuss it, use the keywords such as recursive, memorized, dynamic programming, chain rule, etc.

Problem 6

1) You are given a training set of five real valued points and their 2 classes classifications (positive and negative) :

data label
1.5 positive
3.2 positive
5.4 negative
6.2 negative
8.5 negative
  • What is the predicted class for a test example at point 4.0 using 3-NN?

  • What is the decision boundary associated with this training set using 3-NN?

  • (True or False): For any 2 classes, linearly separable training set (e.g., the one given above), a 3-NN classifier will always have 100% accuracy on the traing set. Why or why not?


2) Say we have a training set consisting of 30 positive examples and 10 negative examples where each exmaple is a point in a two dimensional, real valued feature space.

  • What will the classification accuracy be on the training set with 1-NN?

  • What will the classification accuracy be on the training set with 40-NN?

Problem 7

1) Classify the given four points into two classes in 2D plane using a single layer structure as shown below. Plot a linear boundary even if it fails to classify them.

$\;\;\,$ Note that bias units are not indicated here.



In [12]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

x_data = np.array([[0, 0], [1, 1], [0, 1], [1, 0]], dtype=np.float32)
y_data = np.array([[0], [0], [1], [1]], dtype=np.float32)

plt.figure(figsize = (8,6))
plt.scatter(x_data[:2,0], x_data[:2,1], marker='+', s=100, label='A')
plt.scatter(x_data[2:,0], x_data[2:,1], marker='x', s=100, label='B')
plt.axis('equal')
plt.ylim([-0.5, 1.5]); 
plt.grid(alpha=0.15); 
plt.legend();
plt.show()
In [3]:
## write your code here
#

2) Classify the given four points in 2D plane using two layers as shown below.

$\;\;\,$ Note that bias units are not indicated here.



In [ ]:
## write your code here
#

Problem 8

1) In the problem 7-2), the first layer can be seen as kernel function $\phi$. Show the location of four points on 2D plane after the first layer.

In [ ]:
## write your code here
#

2) Visualize the kernel space onto 2D plane.

$\;\;\,$ Hint: Make 2d grid points and apply the kernel.

In [ ]:
## write your code here
#

3) Plot the decision boundary on kernel space.

In [ ]:
## write your code here
#

Problem 9

Here, we will solve the same problem that we covered in class.

In [37]:
from sklearn.preprocessing import OneHotEncoder

m = 1000
x1 = 10*np.random.rand(m, 1) - 5
x2 = 8*np.random.rand(m, 1) - 4

g = - 0.5*(x1-1)**2 + 2*x2 + 5

C1 = np.where(g >= 0)[0]
C0 = np.where(g < 0)[0]
N = C1.shape[0]
M = C0.shape[0]
m = N + M

X1 = np.hstack([x1[C1], x2[C1]])
X0 = np.hstack([x1[C0], x2[C0]])

train_X = np.vstack([X1, X0])
train_X = np.asmatrix(train_X)

train_y = np.vstack([np.ones([N,1]), np.zeros([M,1])])
ohe = OneHotEncoder(handle_unknown='ignore')
train_y = ohe.fit_transform(train_y).toarray()

plt.figure(figsize=(10, 8))
plt.plot(x1[C1], x2[C1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(x1[C0], x2[C0], 'bo', alpha = 0.4, label = 'C0')
plt.title('Nonlinearly Distributed Data', fontsize = 15)
plt.legend(loc = 1, fontsize = 15)
plt.xlabel(r'$x_1$', fontsize = 15)
plt.ylabel(r'$x_2$', fontsize = 12)
plt.show()

1) Classify the above given data using MLPClassifier in 2D plane with the hidden layer of size of 3 as shown below.




In [40]:
## write your code here
#

2) Plot those data in Z-plane (hidden layer)

In [42]:
## write your code here
#

3) Plot 2D hyperplane that separates those data into C1 and C0

In [46]:
## write your code here
#

Problem 10

With the below dataset, you are asked to apply ANN to the multiclass classification problem. You are supposed to design your own ANN structure. Plot the linear classification boundaries.

In [13]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## generate three simulated clusters

mu1 = np.array([1, 7])
SIGMA1 = 0.8*np.array([[1, 1.5],
                       [1.5, 3]])
X1 = np.random.multivariate_normal(mu1, SIGMA1, 100)

mu2 = np.array([3, 4])
SIGMA2 = 0.3*np.array([[1, 0],
                       [0, 1]])
X2 = np.random.multivariate_normal(mu2, SIGMA2, 100)

mu3 = np.array([7, 5])
SIGMA3 = 0.3*np.array([[1, -1],
                       [-1, 2]])
X3 = np.random.multivariate_normal(mu3, SIGMA3, 50)

plt.figure(figsize = (10, 8))
plt.title('Generated Data', fontsize=15)
plt.plot(X1[:,0], X1[:,1], '.')
plt.plot(X2[:,0], X2[:,1], '.')
plt.plot(X3[:,0], X3[:,1], '.')
plt.xlabel('$X_1$', fontsize = 15)
plt.ylabel('$X_2$', fontsize = 15)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.axis([-2, 10, 1, 12])
plt.show()
In [ ]:
## write your code here
#

Problem 11

[hand-written] Suppose a multi-layer perceptron which has an input layer with 10 neurons, a hidden layer with 50 neurons, and an output layer with 3 neurons. Non-linear activation function for every neurons are ReLU. Write your answer to the following questions.

(1) Size of input $X$?

(2) Size of weights and biases ($W_h, b_h$) for the hidden layer?

(3) Size of weights and biases ($W_o, b_o$) for the output layer?

(4) Size of output $Y$?

Problem 12

We are going to do 3 classes classification. The class of each digit is devided according to the remainder divided by 3. Plot the random images and labels.

(ex., 0 $\Rightarrow$ class 0, 1 $\Rightarrow$ class 1, 2 $\Rightarrow$ class 2, 3 $\Rightarrow$ class 0, 4 $\Rightarrow$ class 1,, $\cdots$)

Let's load the dataset.

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

mnist_train_images = np.load('./data_files/mnist_train_images_rev.npy')
mnist_train_labels = np.load('./data_files/mnist_train_labels_rev.npy')
mnist_test_images = np.load('./data_files/mnist_test_images_rev.npy')
mnist_test_labels = np.load('./data_files/mnist_test_labels_rev.npy')

train_labels = mnist_train_labels
train_imgs = mnist_train_images
test_labels = mnist_test_labels
test_imgs = mnist_test_images
In [ ]:
## write your code here
#

2) Make your own ANN model that classifies MNIST images into 3 classes.

In [ ]:
## write your code here
#

3) Train your model and check its accuracy

In [ ]:
## write your code here
#

Problem 13

In this problem, we want to conduct regression for nonlinearly distributed data using multilayer perceptron.

Use MLP (or ANN) to find a regression curve, and then plot it with data.

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPRegressor
%matplotlib inline

# 10 data points
n = 10
x = np.linspace(-4.5, 4.5, 10).reshape(-1,1)
y = np.array([0.9819, 0.7973, 1.9737, 0.1838, 1.3180, -0.8361, -0.6591, -2.4701, -2.8122, -6.2512]).reshape(-1,1)

plt.figure(figsize = (10, 8))
plt.plot(x, y, 'o', label = 'Data')
plt.xlabel('X', fontsize = 15)
plt.ylabel('Y', fontsize = 15)
plt.grid(alpha = 0.3)
plt.show()
In [ ]:
## write your code here
#
In [ ]:
xp = np.arange(-5, 5, 0.01).reshape(-1, 1)
yp = reg.predict(xp)

plt.figure(figsize = (10, 8))
plt.plot(x, y, 'o', label = 'Data')
plt.plot(xp, yp, 'r-', label = 'Regression')
plt.xlabel('X', fontsize = 15)
plt.ylabel('Y', fontsize = 15)
plt.grid(alpha = 0.3)
plt.legend(fontsize=12)
plt.show()

Problem 14

Rotating Machinery Diagnosis with Logistic Regression

Mechanical systems are always vibrating when operating. So, vibration analysis is one of the most popular and conventional ways to diagnose the mechanical system. In this problem, we are going to use the logistic regression algorithm to identify abnormality of a mechanical system.

Data information

  • File format: npy

  • Information: signal, label

  • Sampling rate: 12,800 Hz

  • File length (time): 0.78 sec

  • Labels based on one-hot encoding

    • A (Normal): [1,0]
    • B (Abnormal): [0,1]

Data Download Link

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import IPython.display as ipd
import scipy.stats
import scipy
%matplotlib inline

# data load
data = {
    'signal': np.load('./data_files/signal.npy'),
    'label': np.load('./data_files/label.npy')
}

idx_A = []
idx_B = []
for i in range(len(data['label'])):
    if data['label'][i] == False:
        idx_A.append(i)
    else:
        idx_B.append(i)

# one-hot encoding
data['label'] = tf.keras.utils.to_categorical(data['label'])

Run the below cells to hear the sound of A and B.

In [ ]:
fs = 12800
signal_A = data['signal'][np.random.choice(idx_A),:]
ipd.Audio(signal_A, rate = fs)
In [ ]:
signal_B = data['signal'][np.random.choice(idx_B),:]
ipd.Audio(signal_B, rate = fs)

(1) Plot 2 randomly selected vibration signals.

In [ ]:
## Your code here
#

Now, we need to extract features from the raw signal.

  • Use the following statistical features
    • peak, rms, kurtosis, crest factor, impulse factor, shape factor, skewness, smr, peak-peak

The equations are as follows:

$$\begin{align*} \text{Peak Value } &= \max_{n = 1, \dots, N} x_n \\ \\ \text{RMS } &= \sqrt{\frac{1}{N}\sum_{n=1}^N x_n^2} \\ \\ \text{Kurtosis } &= \frac{\frac{1}{N}\sum_{n=1}^N (x_n-\bar{x})^4}{\text{Var}^2} \\ \\ \text{Var } &= \frac{1}{N}\sum_{n=1}^N (x_n-\bar{x})^2 \\\\ \text{Crest Factor } &= \frac{\text{Peak}}{\text{RMS}}\\\\ \text{Impulse Factor } &= \frac{\text{Peak}}{\text{Mean}} \\ \\ \text{Mean } &= \frac{1}{N}\sum_{n=1}^N \lvert x_n \rvert \\\\ \text{Shape Factor } &= \frac{\text{RMS}}{\text{Mean}}\\\\ \text{Skewness } &= \frac{\frac{1}{N}\sum_{n=1}^N(x_n-\bar{x})^3}{\text{Var}^{3/2}}\\\\ \text{SMR } &= \left(\frac{1}{N}\sum_{n=1}^N\sqrt{x_n} \right)^2 \\\\ \text{Peak-Peak Value } &= \max_{n = 1, \dots, N} x_n-\min_{n = 1, \dots, N} x_n \end{align*}$$$$x_i: \text{signal data}$$


The function for feature extraction is already ready for you. The function input and output are data $x$ and horizontally stacked feature vector, respectively.

In [ ]:
def extfeat(x):
    fvector = []
   
    # time domain feature
    peak = np.max(np.abs(x))
    fvector.append(peak)
   
    rms = np.sqrt(np.mean(x**2))
    fvector.append(rms)
   
    kurtosis = scipy.stats.kurtosis(x)
    fvector.append(kurtosis)
   
    crest_factor = fvector[0]/fvector[1]
    fvector.append(crest_factor)
    
    impulse_factor = fvector[0]/(np.sum(np.abs(x))/len(x))
    fvector.append(impulse_factor)
    
    shape_factor = fvector[1]/(np.sum(np.abs(x))/len(x))
    fvector.append(shape_factor)

    skewness = scipy.stats.skew(x)
    fvector.append(skewness)
   
    smr = (np.sum(np.sqrt(np.abs(x)))/len(x))**2
    fvector.append(smr)
   
    pp = np.max(x) - np.min(x)
    fvector.append(pp)
   
    return fvector

feature_name = ['Peak', 'RMS', 'Kurtosis', 'Crest Factor', 'Impulse Factor','Shape Factor','Skewness','SMR', 'Peak-Peak']

(2) Print out features of a randomly selected signal. You can use the above extfeat function.

In [ ]:
## Your code here
#

(3) Split train and test data with your own separating ratio.

In [ ]:
## Your code here
#

(4) Design your logistic regression model using ANN and train it.

In [ ]:
## Your code here
#

(5) Print out learned $\omega$.

In [ ]:
## Your code here
#

(6) Compute test accuracy. You should use learned $\omega$ to predict labels for unseen test data.

Note: test accuracy would be somewhat low (around 80%).




In [ ]:
## Your code here
#

Problem 15

Build the ANN model which receives three binary-valued (i.e., $0$ or $1$) inputs $x_1,x_2,x_3$, and outputs $1$ if exactly two of the inputs are $1$, and outputs $0$ otherwise. All of the units use a hard threshold activation function:


$$z = \begin{cases} 1 \quad \text{if } z \geq 0\\ 0 \quad \text{if } z < 0 \end{cases} $$

Suggest one of possible weights and biases which correctly implement this function.




Denote by

  • $\mathbf{W}_{2 \times 3}$ and $\mathbf{V}_{1 \times 2}$ weight matrices connecting input and hidden layer, and hidden layer and output respectively.

  • $\mathbf{b}^{(1)}_{2 \times 1}$ and $\mathbf{b}^{(2)}_{1 \times 1}$ biases matrices at hidden layer and output, respectively.

  • $x_{3 \times 1}$ and $h_{2 \times 1}$ node values at input and hidden layer, repectively.