Explain why the perceptron cannot solve the XOR problem.
ANN (Artificial Neural Networks) is also called as MLP (Multilayer Perceptron). Explain why the MLP is able to solve the XOR problem.
a) The training data size is not large enough. Collect a larger training data and retrain it.
b) Play with learning rate and add regularization term to the objective function.
c) Use a different initialization and train the network several times. Use the average of predictions from all nets to predict test data.
d) Use the same training data but add two more hidden layers.
a) Single perceptron can solve a lineary inseperable problem with a kernel function.
b) Gradient descent trains neural networks to the global optimum.
a) The training data points are not large enough. Collect a larger training data points and re-train it.
b) The number of perceptron layers is too large. Remove the perceptron layers and re-train it.
c) The number of perceptron layers is too small. Add more the perceptron layers and re-train it.
d) Learning rate is too high. Reduce learning rate and re-train it.
To deal with nonlinearly distributed data set, we need to design an appropriate kernel to make (or map) the original data to be linearly seperable. However, in artificial neural networks, this step is not necessary. Discuss why. (Hint: see the following figure.)
Suppose a multi-layer perceptron which has an input layer with 10 neurons, a hidden layer with 50 neurons, and an output layer with 3 neurons. Non-linear activation function for every neurons are ReLU. Write your answer to the following questions.
Size of input $X$?
Size of weights and biases ($W_h, b_h$) for the hidden layer?
Size of weights and biases ($W_o, b_o$) for the output layer?
Size of output $Y$?
To train neural networks, backpropagation is used. Briefly explain what the backpropagation is. When you discuss it, use the keywords such as recursive, memorized, dynamic programming, chain rule, etc.
Build the ANN model which receives three binary-valued (i.e., $0$ or $1$) inputs $x_1,x_2,x_3$, and outputs $1$ if exactly two of the inputs are $1$, and outputs $0$ otherwise. All of the units use a hard threshold activation function:
Suggest one of possible weights and biases which correctly implement this function.
Denote by
$\mathbf{W}_{2 \times 3}$ and $\mathbf{V}_{1 \times 2}$ weight matrices connecting input and hidden layer, and hidden layer and output respectively.
$\mathbf{b}^{(1)}_{2 \times 1}$ and $\mathbf{b}^{(2)}_{1 \times 1}$ biases matrices at hidden layer and output, respectively.
$x_{3 \times 1}$ and $h_{2 \times 1}$ node values at input and hidden layer, repectively.
In this problem, we are going to compute the gradient using the chain rule and dynamic programming, and update the weights $\omega \rightarrow \omega^+$. After that, the weights are updated through 1 back-propagation and compared with the error before the update.
Neural Network Model
1/2 MSE
for calculation convenience. For example, $E = \frac{1}{2}\sum(\text{target} - \text{output})^2$Note that bias units are not indicated here.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x_data = np.array([[0, 0], [1, 1], [0, 1], [1, 0]], dtype=np.float32)
y_data = np.array([[0], [0], [1], [1]], dtype=np.float32)
plt.figure(figsize = (8,6))
plt.scatter(x_data[:2,0], x_data[:2,1], marker='+', s=100, label='A')
plt.scatter(x_data[2:,0], x_data[2:,1], marker='x', s=100, label='B')
plt.axis('equal')
plt.ylim([-0.5, 1.5]);
plt.grid(alpha=0.15);
plt.legend();
plt.show()
## write your code here
#
Note that bias units are not indicated here and you can use either one-hot-encoding or sparse_categorical_crossentropy
.
## write your code here
#
## write your code here
#
Hint: Make 2d grid points and apply the kernel.
## write your code here
#
## write your code here
#
You will do binary classification for nonlinearly seperable data using MLP. Plot the given data first.
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline
N = 200
M = 2*N
gamma = 0.01
G0 = np.random.multivariate_normal([0, 0], gamma*np.eye(2), N)
G1 = np.random.multivariate_normal([1, 1], gamma*np.eye(2), N)
G2 = np.random.multivariate_normal([0, 1], gamma*np.eye(2), N)
G3 = np.random.multivariate_normal([1, 0], gamma*np.eye(2), N)
train_X = np.vstack([G0, G1, G2, G3])
train_y = np.vstack([np.ones([M,1]), np.zeros([M,1])])
train_X = np.asmatrix(train_X)
train_y = np.asmatrix(train_y)
print(train_X.shape)
print(train_y.shape)
plt.figure(figsize = (6, 4))
plt.plot(train_X[:M,0], train_X[:M,1], 'b.', alpha = 0.4, label = 'A')
plt.plot(train_X[M:,0], train_X[M:,1], 'r.', alpha = 0.4, label = 'B')
plt.axis('equal')
plt.xlim([-1, 2]); plt.ylim([-1, 2]);
plt.grid(alpha = 0.15)
plt.legend(fontsize = 12)
plt.show()
model = tf.keras.models.Sequential([
## your code here
])
model.summary()
## your code here
#
## your code here
#
## write down your discussion here
#
#
#
model = tf.keras.models.Sequential([
## your code here
])
model.summary()
## your code here
#
## your code here
#
## your code here
#
Here, we will solve the same problem that we covered in class.
# training data gerneration
m = 1000
x1 = 10*np.random.rand(m, 1) - 5
x2 = 8*np.random.rand(m, 1) - 4
g = - 0.5*(x1-1)**2 + 2*x2 + 5
C1 = np.where(g >= 0)[0]
C0 = np.where(g < 0)[0]
N = C1.shape[0]
M = C0.shape[0]
m = N + M
X1 = np.hstack([x1[C1], x2[C1]])
X0 = np.hstack([x1[C0], x2[C0]])
train_X = np.vstack([X1, X0])
train_X = np.asmatrix(train_X)
train_y = np.vstack([np.ones([N,1]), np.zeros([M,1])])
plt.figure(figsize = (6, 4))
plt.plot(x1[C1], x2[C1], 'ro', alpha = 0.4, label = 'C1')
plt.plot(x1[C0], x2[C0], 'bo', alpha = 0.4, label = 'C0')
plt.legend(loc = 1, fontsize = 15)
plt.xlabel(r'$x_1$', fontsize = 15)
plt.ylabel(r'$x_2$', fontsize = 15)
plt.xlim([-5, 5])
plt.ylim([-4, 4])
plt.show()
## write your code here
#
## write your code here
#
## write your code here
#
With the below dataset, you are asked to apply ANN to the multiclass classification problem. You are supposed to design your own ANN structure. Plot the linear classification boundaries.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
## generate three simulated clusters
mu1 = np.array([1, 7])
SIGMA1 = 0.8*np.array([[1, 1.5],
[1.5, 3]])
X1 = np.random.multivariate_normal(mu1, SIGMA1, 100)
mu2 = np.array([3, 4])
SIGMA2 = 0.3*np.array([[1, 0],
[0, 1]])
X2 = np.random.multivariate_normal(mu2, SIGMA2, 100)
mu3 = np.array([7, 5])
SIGMA3 = 0.3*np.array([[1, -1],
[-1, 2]])
X3 = np.random.multivariate_normal(mu3, SIGMA3, 50)
plt.figure(figsize = (6, 4))
plt.title('Generated Data', fontsize=15)
plt.plot(X1[:,0], X1[:,1], '.')
plt.plot(X2[:,0], X2[:,1], '.')
plt.plot(X3[:,0], X3[:,1], '.')
plt.xlabel('$X_1$', fontsize = 15)
plt.ylabel('$X_2$', fontsize = 15)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.axis([-2, 10, 1, 12])
plt.show()
## write your code here
#
We are going to do 3 classes classification. The class of each digit is devided according to the remainder divided by 3.
(ex., 0 $\Rightarrow$ class 0, 1 $\Rightarrow$ class 1, 2 $\Rightarrow$ class 2, 3 $\Rightarrow$ class 0, 4 $\Rightarrow$ class 1, $\cdots$)
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
mnist_train_images = np.load('./data_files/mnist_train_images_rev.npy')
mnist_train_labels = np.load('./data_files/mnist_train_labels_rev.npy')
mnist_test_images = np.load('./data_files/mnist_test_images_rev.npy')
mnist_test_labels = np.load('./data_files/mnist_test_labels_rev.npy')
train_labels = mnist_train_labels
train_imgs = mnist_train_images
test_labels = mnist_test_labels
test_imgs = mnist_test_images
## write your code here
#
## write your code here
#
## write your code here
#
In this problem, we want to conduct regression for nonlinearly distributed data using multilayer perceptron.
Use MLP (or ANN) to find a regression curve, and then plot it with data.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPRegressor
%matplotlib inline
# 10 data points
n = 10
x = np.linspace(-4.5, 4.5, 10).reshape(-1,1)
y = np.array([0.9819, 0.7973, 1.9737, 0.1838, 1.3180, -0.8361, -0.6591, -2.4701, -2.8122, -6.2512]).reshape(-1,1)
plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', label = 'Data')
plt.xlabel('X', fontsize = 15)
plt.ylabel('Y', fontsize = 15)
plt.grid(alpha = 0.3)
plt.show()
## write your code here
#
xp = np.arange(-5, 5, 0.01).reshape(-1, 1)
yp = reg.predict(xp)
plt.figure(figsize = (10, 8))
plt.plot(x, y, 'o', label = 'Data')
plt.plot(xp, yp, 'r-', label = 'Regression')
plt.xlabel('X', fontsize = 15)
plt.ylabel('Y', fontsize = 15)
plt.grid(alpha = 0.3)
plt.legend(fontsize=12)
plt.show()
Rotating Machinery Diagnosis with Logistic Regression
Mechanical systems are always vibrating when operating. So, vibration analysis is one of the most popular and conventional ways to diagnose the mechanical system. In this problem, we are going to use the logistic regression algorithm to identify abnormality of a mechanical system.
Data information
File format: npy
Information: signal, label
Sampling rate: 12,800 Hz
File length (time): 0.78 sec
Labels based on one-hot encoding
Data Download Link
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import IPython.display as ipd
import scipy.stats
import scipy
%matplotlib inline
# data load
data = {
'signal': np.load('./data_files/signal.npy'),
'label': np.load('./data_files/label.npy')
}
idx_A = []
idx_B = []
for i in range(len(data['label'])):
if data['label'][i] == False:
idx_A.append(i)
else:
idx_B.append(i)
# one-hot encoding
data['label'] = tf.keras.utils.to_categorical(data['label'])
Run the below cells to hear the sound of A and B.
fs = 12800
signal_A = data['signal'][np.random.choice(idx_A),:]
ipd.Audio(signal_A, rate = fs)
signal_B = data['signal'][np.random.choice(idx_B),:]
ipd.Audio(signal_B, rate = fs)
## Your code here
#
Now, we need to extract features from the raw signal.
The equations are as follows:
$$\begin{align*} \text{Peak Value } &= \max_{n = 1, \dots, N} x_n \\ \\ \text{RMS } &= \sqrt{\frac{1}{N}\sum_{n=1}^N x_n^2} \\ \\ \text{Kurtosis } &= \frac{\frac{1}{N}\sum_{n=1}^N (x_n-\bar{x})^4}{\text{Var}^2} \\ \\ \text{Var } &= \frac{1}{N}\sum_{n=1}^N (x_n-\bar{x})^2 \\\\ \text{Crest Factor } &= \frac{\text{Peak}}{\text{RMS}}\\\\ \text{Impulse Factor } &= \frac{\text{Peak}}{\text{Mean}} \\ \\ \text{Mean } &= \frac{1}{N}\sum_{n=1}^N \lvert x_n \rvert \\\\ \text{Shape Factor } &= \frac{\text{RMS}}{\text{Mean}}\\\\ \text{Skewness } &= \frac{\frac{1}{N}\sum_{n=1}^N(x_n-\bar{x})^3}{\text{Var}^{3/2}}\\\\ \text{SMR } &= \left(\frac{1}{N}\sum_{n=1}^N\sqrt{x_n} \right)^2 \\\\ \text{Peak-Peak Value } &= \max_{n = 1, \dots, N} x_n-\min_{n = 1, \dots, N} x_n \end{align*}$$$$x_i: \text{signal data}$$def extfeat(x):
fvector = []
# time domain feature
peak = np.max(np.abs(x))
fvector.append(peak)
rms = np.sqrt(np.mean(x**2))
fvector.append(rms)
kurtosis = scipy.stats.kurtosis(x)
fvector.append(kurtosis)
crest_factor = fvector[0]/fvector[1]
fvector.append(crest_factor)
impulse_factor = fvector[0]/(np.sum(np.abs(x))/len(x))
fvector.append(impulse_factor)
shape_factor = fvector[1]/(np.sum(np.abs(x))/len(x))
fvector.append(shape_factor)
skewness = scipy.stats.skew(x)
fvector.append(skewness)
smr = (np.sum(np.sqrt(np.abs(x)))/len(x))**2
fvector.append(smr)
pp = np.max(x) - np.min(x)
fvector.append(pp)
return fvector
feature_name = ['Peak', 'RMS', 'Kurtosis', 'Crest Factor', 'Impulse Factor','Shape Factor','Skewness','SMR', 'Peak-Peak']
extfeat
function.## Your code here
#
## Your code here
#
## Your code here
#
## Your code here
#
Note: test accuracy would be somewhat low (around 80%).
## Your code here
#