Machine Learning for Mechanical Engineering

Midterm Exam: Part II

04/17/2024, 8:00 PM to 10:00 PM (120 minutes)

Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST

Problem 01¶

Jessica has been working on formulating the temperature (= $y$) of a semiconductor these days. She discovered that a semiconductor's temperature was related to the distance between atoms (= $x$). She assumed that the temperature of a semiconductor had an exponential relationship with the distance between atoms (i.e., $\exp(x)$). She also assumed that the temperature of the semiconductor had a periodic relationship with the distance between atoms (i.e., $\sin(x)$ or $\cos(x)$). Therefore, she tries to formulate the semiconductor temperature with a linear combination of $\sin(x)$, $\cos(x)$ and $\exp(x)$.

Formulate the temperature of semiconductor ($y$) from the given distance between atoms ($x$) using (nonlinear) regression.

$$ y = \theta_0 + \theta_1 \exp(x) + \theta_2 \sin(x) + \theta_3 \cos(x)$$

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-5,5,30)
y = np.array([15.16043501, 14.93790538, 14.3694888, 13.52210637, 12.49552255,
              11.41059962, 10.39506827, 9.56848955, 9.02817864, 8.8377477,
              9.01961664, 9.55237357, 10.37329569, 11.38573378, 12.47049097,
              13.49985608, 14.35263938, 14.92844055, 15.15946909, 15.01852543,
              14.52220321, 13.72893569, 12.73211624, 11.64910292, 10.60740159,
              9.72965436, 9.1192006, 8.84791056, 8.94772391, 9.40688936])

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', label = "Data")
plt.xlabel("X", fontsize = 12)
plt.ylabel("Y", fontsize = 12)
plt.grid(alpha = 0.3)
plt.show()

m = np.shape(x)[0]
x = np.reshape(x,(-1,1))
y = np.reshape(y,(-1,1))

## write your code here to get coefficients of regression (theta)
#
A = np.hstack([x**0, np.exp(x), np.sin(x), np.cos(x)])
A = np.asmatrix(A)
y = np.asmatrix(y)
theta = np.ones([10,1])
theta = (A.T*A).I*A.T*y
print(theta)

[[ 1.20000000e+01]
 [-1.67932404e-11]
 [ 3.00000000e+00]
 [ 9.99999999e-01]]

Discuss the results of the estimated coefficients in terms of feature importance.

The estimated coefficients, $ \theta_1 $, $ \theta_2 $, $ \theta_3 $, quantify the strength and type of contributions of each term in the nonlinear regression model for predicting the temperature of a semiconductor. These coefficients correspond to the following terms, respectively:

$ 1 $: The coefficient $ \theta_0 $ is the intercept term, representing the baseline value of $y$ when $ x = 0$
$ e^{x} $: The coefficient $ \theta_1 $ scales the exponential effect of the distance between atoms. A larger absolute value of $ \theta_1 $ suggests a exponential influence on the temperature.
$ \sin(x) $: The coefficient $ \theta_2 $ is associated with the sine of the distance between atoms. Since sine oscillates between -1 and 1, $ \theta_2 $ reflects the amplitude of this periodic influence.
$ \cos(x) $: Similarly, the coefficient $ \theta_3 $ corresponds to the cosine of the distance. The magnitude of $ \theta_3 $ indicates the significance of the cosine's oscillation effect on temperature.

Plot your regression function in the range of xp below.

xp = np.linspace(-5,5,30).reshape(-1,1)

## write down your code
#
yp = theta[0,0] + theta[1,0]*np.exp(xp) + theta[2,0]*np.sin(xp) + theta[3,0]*np.cos(x)

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', label = "Data")
plt.plot(xp, yp, 'r')
plt.xlabel("X", fontsize = 12)
plt.ylabel("y", fontsize = 12)
plt.grid(alpha = 0.3)
plt.show()

Jessica finds that if the distance of atom is 8, the temperature of semiconductor is 14.822. Evaluate your model whether it can predict outside the range of training dataset using the following metric.

$$\left\lvert \frac{y_{\text{true}} - y_{\text{pred}}}{y_{\text{true}}} \right\rvert \times 100$$

y_true = 14.822
xp = 8

## write your code here
#
yp = theta[0,0] + theta[1,0]*np.exp(xp) + theta[2,0]*np.sin(xp) + theta[3,0]*np.cos(xp)
abs(((yp-y_true)/y_true))*100

0.0038770455245837423

Peterson was also curious about the relationship between semiconductor temperature and atomic distance. Peterson opted to use polynomial regression to determine the link between two variables. Conduct polynomial regression again with a degree of 7 polynomial basis functions.

$$\hat y = \theta_0 + \theta_1 x + \cdots + \theta_7 x^7$$

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-5,5,30)
y = np.array([15.16043501, 14.93790538, 14.3694888, 13.52210637, 12.49552255,
              11.41059962, 10.39506827, 9.56848955, 9.02817864, 8.8377477,
              9.01961664, 9.55237357, 10.37329569, 11.38573378, 12.47049097,
              13.49985608, 14.35263938, 14.92844055, 15.15946909, 15.01852543,
              14.52220321, 13.72893569, 12.73211624, 11.64910292, 10.60740159,
              9.72965436, 9.1192006, 8.84791056, 8.94772391, 9.40688936])

m = np.shape(x)[0]
x = np.reshape(x,(-1,1))
y = np.reshape(y,(-1,1))

## write down your code to get coefficients of polynomial regression (theta)
#
B = np.hstack([x**i for i in range(8)])
B = np.asmatrix(B)
y = np.asmatrix(y)
theta = np.ones([10,1])

theta = (B.T*B).I*B.T*y
print(theta)

[[ 1.29664035e+01]
 [ 2.94403829e+00]
 [-4.51534179e-01]
 [-4.67089236e-01]
 [ 3.06877594e-02]
 [ 1.97298120e-02]
 [-5.50768675e-04]
 [-2.67372620e-04]]

Plot your polynomial linear regression function using the range of xp below.

xp = np.linspace(-5,5,30).reshape(-1,1)

## write down your code
#
yp = theta[0,0] + theta[1,0]*xp + theta[2,0]*xp**2 + theta[3,0]*xp**3 + theta[4,0]*xp**4 + theta[5,0]*xp**5 + theta[6,0]*xp**6

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', label = "Data")
plt.plot(xp, yp, 'r')
plt.xlabel("X", fontsize = 12)
plt.ylabel("y", fontsize = 12)
plt.grid(alpha = 0.3)
plt.show()

Peterson also found that if the distance of atom is 8, the temperature of semiconductor is 14.822. Evaluate your model whether it can predict outside the range of training dataset using the following metric.

$$\left\lvert \frac{y_{\text{true}} - y_{\text{pred}}}{y_{\text{true}}} \right\rvert \times 100$$

y_true = 14.822
xp = 8

## write your code here
#
yp = theta[0,0] + theta[1,0]*xp + theta[2,0]*xp**2 + theta[3,0]*xp**3 + theta[4,0]*xp**4 + theta[5,0]*xp**5 + theta[6,0]*xp**6
print(abs(((yp-y_true)/y_true))*100)

507.6988681611356

Discuss the two models in terms of generalization.

Jessica's model have generalization because it can predict extrapolation very well. However, Peterson's model is overfitted in the range of training dataset.

Problem 02¶

Load lasso_input_data.pkl and lasso_target_data.pkl using the following command.

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive

from six.moves import cPickle

x = cPickle.load(open('/content/drive/MyDrive/ML_Colab/ML_data/lasso_input_data.pkl', 'rb'))
y = cPickle.load(open('/content/drive/MyDrive/ML_Colab/ML_data/lasso_target_data.pkl', 'rb'))

print(x.shape)
print(y.shape)

(506, 13)
(506,)

In this problem, you are asked to predict a target value $\hat y$ from 13 features (input data). Assume that your model has the following relationship ($\omega_0 = 0$):

$$\omega_1 x_1 + \omega_2 x_2 + \cdots + \omega_{13} x_{13} = \hat y$$

Build the linear regression model using the Lasso regularizer.

(Hint : You can use cvxpy module to solve the above optimization problem)

# Write your code here
#
import cvxpy as cvx

A = x
y = y.reshape(506,1)
d = 13

lamb = 20
theta = cvx.Variable([d, 1])
obj = cvx.Minimize(cvx.sum_squares(A@theta - y) + lamb*cvx.norm(theta, 1))
prob = cvx.Problem(obj, [])
result = prob.solve()
print(theta.value)

[[-9.04219506e-02]
 [ 4.91200936e-02]
 [-1.49506846e-02]
 [ 2.50044065e+00]
 [ 1.12400830e-20]
 [ 5.76784573e+00]
 [-9.14331903e-03]
 [-9.34004299e-01]
 [ 1.72291748e-01]
 [-1.00519860e-02]
 [-3.89478763e-01]
 [ 1.48170211e-02]
 [-4.33160161e-01]]

After applying Lasso, we can select meaningful features. Plot the magnitudes of all $\omega$'s and tell which features are important.

# Write your code here
#
plt.figure(figsize = (6, 4))
plt.title(r'magnitude of $\theta$', fontsize = 12)
plt.xlabel(r'$\theta$', fontsize = 12)
plt.ylabel('magnitude', fontsize = 12)
plt.stem(np.linspace(0, 12, 13).reshape(-1, 1), theta.value)
plt.axis('equal')
plt.xlim([0.5, 10.5])
plt.grid(alpha = 0.3)
plt.show()

Update the linear regresion model with the selected features only.

# Write your code here
#
d = 4
features = [3, 5, 7, 10]
low_A = A[:, features]

theta = cvx.Variable([d, 1])
obj = cvx.Minimize(cvx.sum_squares(low_A@theta - y))
prob = cvx.Problem(obj, [])
result = prob.solve()
print(theta.value)

[[ 3.45366475]
 [ 7.13766324]
 [ 0.32004063]
 [-1.28737363]]

Problem 03¶

We are going to classify some fashion items by FDA (LDA). Let's load the dataset.

Download data files (fashion_data.npy, fashion_data_labels.npy)

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

data = np.load('/content/drive/MyDrive/ML_Colab/ML_data/fashion_data.npy')
data_labels = np.load('/content/drive/MyDrive/ML_Colab/ML_data/fashion_data_labels.npy')

target_dict = {
    0: 'T-shirt/top',
    1: 'Trouser',
    2: 'Pullover',
    3: 'Dress',
    4: 'Coat',
    5: 'Sandal',
    6: 'Shirt',
    7: 'Sneaker',
    8: 'Bag',
    9: 'Ankle boot',
}

plt.figure(figsize = (8, 8))

for i in range(0, 20):
    plt.subplot(5, 5, i+1)
    plt.imshow(data[i], 'gray')
    plt.title(target_dict[(data_labels[i])], fontsize = 10)
    plt.xticks([])
    plt.yticks([])

Choose T-shirt/top class and trouser class from 10 classes

## fill out the blank
#
T_shirt_top_data = data[data_labels == 0]
Trouser_data = data[data_labels == 1]

Select 1000 data from two selected classes, respectively.

## Write your code here
#
T_shirt_top_data = T_shirt_top_data[:1000]
Trouser_data = Trouser_data[:1000]

print(Trouser_data.shape)

(1000, 28, 28)

Plot random images for two selected classes.

## Write your code here
#
plt.figure(figsize = (8, 8))

for i in range(0, 20):
    plt.subplot(5,5, i+1)
    plt.imshow(T_shirt_top_data[i], 'gray')
    plt.title(target_dict[0])
    plt.xticks([])
    plt.yticks([])

## Write your code here
#
plt.figure(figsize = (8, 8))
for i in range(0, 20):
    plt.subplot(5,5, i+1)
    plt.imshow(Trouser_data[i], 'gray')
    plt.title(target_dict[1])
    plt.xticks([])
    plt.yticks([])

Now we must select the own ‘features’ from image data to classify T_shirt_top and Trouser. The following two features are recommended

(feature 1) The total variance pixels over the entire location.
(feature 2) The average pixels located at the center of the image (img[15:25,13:15]).

## fill out the blank
#
feature_T_shirt_top_var = np.var(T_shirt_top_data,  axis = (1, 2))
feature_Trouser_var = np.var(Trouser_data, axis = (1, 2))
feature_T_shirt_top_mean = np.mean(T_shirt_top_data[:,15:25,13:15],  axis = (1, 2))
feature_Trouser_mean = np.mean(Trouser_data[:,15:25,13:15], axis = (1, 2))

The shape of each feature should be changed to (1000,1).

## Write your code here
#
feature_T_shirt_top_var = feature_T_shirt_top_var.reshape(len(T_shirt_top_data),-1)
feature_Trouser_var = feature_Trouser_var.reshape(len(Trouser_data),-1)

feature_T_shirt_top_mean = feature_T_shirt_top_mean.reshape(len(T_shirt_top_data),-1)
feature_Trouser_mean = feature_Trouser_mean.reshape(len(Trouser_data),-1)

Plot all the data in feature space

## Write your code here
#
plt.figure(figsize = (6, 4))
plt.title('Feature Space', fontsize = 12)
plt.plot(feature_T_shirt_top_var, feature_T_shirt_top_mean, '.', label = 'T_shirt_top', color = 'b')
plt.plot(feature_Trouser_var, feature_Trouser_mean, '.', label = 'Trouser', color = 'Hotpink')
plt.plot()
plt.grid(alpha = 0.3)
plt.legend(fontsize = 12)
plt.xlabel('Feature 1', fontsize = 12)
plt.ylabel('Feature 2', fontsize = 12)
plt.show()

Solve the problem with FDA (LDA) and visualize the histogram of data on the projected line by class.

from sklearn import discriminant_analysis

## Write your code here
#
Theta0 = np.hstack([feature_T_shirt_top_var, feature_T_shirt_top_mean])
Theta1 = np.hstack([feature_Trouser_var, feature_Trouser_mean])

X = np.vstack([Theta0, Theta1])
y = np.vstack([np.ones([Theta0.shape[0], 1]), np.zeros([Theta1.shape[0], 1])])

# projection line

n0 = len(T_shirt_top_data)
n1 = len(Trouser_data)

mu0 = np.mean(Theta0.T, axis = 1)
mu1 = np.mean(Theta1.T, axis = 1)

S0 = 1/(n0 - 1)*np.matrix(Theta0.T - mu0.reshape(2,1))*np.matrix(Theta0.T - mu0.reshape(2,1)).T
S1 = 1/(n1 - 1)*np.matrix(Theta1.T - mu1.reshape(2,1))*np.matrix(Theta1.T - mu1.reshape(2,1)).T

prj_w = (n0*S0 + n1*S1).I*(mu0.reshape(2,1) - mu1.reshape(2,1))

y1 = Theta0*prj_w
y2 = Theta1*prj_w

plt.figure(figsize = (6, 4))
plt.title('Histogram', fontsize = 12)
plt.hist(y1, 50, label  = 'T_shirt_top', color = 'b', rwidth = 0.6)
plt.hist(y2, 50, label  = 'Trouser', color = 'Hotpink', rwidth = 0.6)
plt.legend()
plt.show()

Plot projection line and classification line in feature space.

## Write your code here
#
clf = discriminant_analysis.LinearDiscriminantAnalysis()
clf.fit(X, np.ravel(y))

# classifier boundary, not a projection line

clf_w = clf.coef_[0]
clf_w0 = clf.intercept_[0]
print(clf_w)
print(clf_w0)

xp = np.arange(-0.5, 1, 0.1)
prj_yp = prj_w[1,0]/prj_w[0,0]*xp
clf_yp = - clf_w[0]/clf_w[1]*xp - clf_w0/clf_w[1]

[-57.86655  18.79981]
-0.85458654

plt.figure(figsize = (6, 6))
plt.plot(Theta0[:,0], Theta0[:,1], '.', color = 'b', label = 'T_shirt_top')
plt.plot(Theta1[:,0], Theta1[:,1], '.', color = 'hotpink', label ='Trouser')
plt.plot(xp, prj_yp, 'k', label = 'FDA(LDA) projection line', linewidth = 2.5)
plt.plot(xp, clf_yp, 'g', label = 'FDA(LDA) classification boundary', linewidth = 2.5)
plt.legend(fontsize = 10)
plt.grid(alpha = 0.3)
plt.axis('equal')
plt.ylim([-0.2, 1])
plt.xlim([-0.05, 0.24])
plt.xlabel('Feature 1', fontsize = 12)
plt.ylabel('Feature 2', fontsize = 12)
plt.show()

Problem 04¶

Let's consider a situation where Sam has a dataset (denoted as $Z$ below) that consists of two types of battery data: lithium battery data (denoted as X below) and aluminum battery data (denoted as Y below) where $Z = X \cup Y$

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

X = np.random.uniform(low = -1, high = 1, size = (100, 2))
X[:, 1] = 5*X[:, 0]**2 + np.random.normal(loc = 0, scale = 1, size = 100) + 5

Y = np.random.uniform(low = -1, high = 1, size = (100, 2))
Y[:, 1] = -8*(Y[:, 0]-0.4)**2 + np.random.normal(loc = 0, scale = 1, size = 100)

Z = np.concatenate([X, Y])

# X: lithium battery data, Y: aluminum battery data, Z: battery dataset
# X.shape = (100,2)
# Y.shape = (100,2)
# Z.shape = (200,2)

Initially, Sam thought that building a single regression model that can fit into the dataset $Z$ would be appropriate, given that the dataset is all about batteries. Using non-linear regression algorithm with polynomial basis functions of degree 5, find and plot an approximate curve of the dataset $Z$. Was Sam right about building a single regression model for the dataset? (Is the regression satisfactory?)

$$\hat y = \theta_0 + \theta_1 x + \cdots + \theta_5 x^5$$

## your code here for non-linear regression algorithm
#
X_values = Z[:, 0]
Y_values = Z[:, 1]
X_values = np.reshape(X_values,(-1,1))
Y_values = np.reshape(Y_values,(-1,1))
m = np.shape(X_values)[0]
X_poly = np.hstack([np.ones([m,1]), X_values, X_values**2, X_values**3, X_values**4, X_values**5])
y = np.asmatrix(Y_values)
X_poly = np.asmatrix(X_poly)
theta = np.ones([6, 1])

lamda = 1
alpha = 1e-3
n_iter = 100000

for i in range(n_iter):
    grad = 2*X_poly.T*X_poly*theta - 2*X_poly.T*y + 2*lamda*theta
    theta = theta - alpha*grad

print(theta)

[[ 2.01678766]
 [ 3.31687851]
 [ 0.82357946]
 [ 1.69506669]
 [-4.34818312]
 [-2.65157812]]

## your code here for the plot
#
xp = np.linspace(-1, 1, 1000)
yp = theta[0,0] + theta[1,0]*xp + theta[2,0]*xp**2 + theta[3,0]*xp**3 + theta[4,0]*xp**4 + theta[5,0]*xp**5

plt.figure(figsize = (6, 4))
plt.plot(X_values, Y_values, 'o')
plt.plot(xp,yp, 'r')
plt.xlabel('feature 0', fontsize = 12)
plt.ylabel('feature 1', fontsize = 12)
plt.xlim([-1.3, 1.3])
plt.ylim([-20, 13])
plt.grid(alpha = 0.3)
plt.show()

Sam was not satisfied with the results above, so he decided to classify the dataset $Z$ into two separate groups: lithium battery data $X$ and aluminum battery data $Y$. Although he knew which data points corresponded to each group, he wanted to build a classification model that could classify new data points as either $X$ or $Y$. To accomplish this, build a classification model using any algorithm you want, and classify $X$ and $Y$ accordingly. Please plot the decision boundary of the classification model.

Note that our decision boundary is defined as follows:
$$ \omega_1 x_1 + \omega_2 x_2 + \omega_0 = 0$$

## your code here for the classification algorithm
#
X0 = np.hstack([np.ones([X.shape[0],1]), X])
X1 = np.hstack([np.ones([Y.shape[0],1]), Y])
T = np.vstack([X0, X1])
T = np.asmatrix(T)

import cvxpy as cvx

m = 100
d = 2
n = 100
g = 1

w = cvx.Variable((d+1,1))
u = cvx.Variable((m, 1))
v = cvx.Variable((m, 1))
obj = cvx.Minimize(cvx.norm(w, 2) + g*(np.ones([1, 100])@u + np.ones([1, 100])@v))
const = [X0@w >= 1-u, X1@w <= -(1-v), u >= 0, v >= 0 ]
prob = cvx.Problem(obj, const).solve()

w = w.value
print(w)

[[-1.66532441]
 [-0.88410126]
 [ 0.69825469]]

## your code here for the plot
#
xp = np.linspace(-1, 1)
yp = - w[1,0]/w[2,0]*xp - w[0,0]/w[2,0]

plt.figure(figsize = (6, 4))
plt.plot(X[:,0], X[:,1], '.', label = 'Lithium')
plt.plot(Y[:,0], Y[:,1], '.', label = 'Aluminum')
plt.plot(xp,yp, 'r')
plt.xlabel('feature 0', fontsize = 12)
plt.ylabel('feature 1', fontsize = 12)
plt.xlim([-1.3, 1.3])
plt.ylim([-20, 13])
plt.grid(alpha = 0.3)
plt.legend(fontsize = 12)
plt.show()

Now, Sam recognizes that it is necessary to build two separate regression algorithms for two separate classes of batteries. Using a non-linear regression algorithm with polynomial basis functions of degree 5, find and plot an approximate curve for each class of data. (i.e., Build two regression models for $X$ and $Y$, separately).

## your code here for non-linear regression algorithm on the lithium battery data
#
X_values1 = X[:, 0]
Y_values1 = X[:, 1]
X_values1 = np.reshape(X_values1,(-1, 1))
Y_values1 = np.reshape(Y_values1,(-1, 1))

X_poly1 = np.hstack([np.ones([m,1]), X_values1, X_values1**2, X_values1**3, X_values1**4, X_values1**5])

m = np.shape(X_values1)[0]
y1 = np.asmatrix(Y_values1)
X_poly1 = np.asmatrix(X_poly1)
theta1 = np.ones([6, 1])

lamda = 1
alpha = 1e-3
n_iter = 100000

for i in range(n_iter):
    grad = 2*X_poly1.T*X_poly1*theta1 - 2*X_poly1.T*y1 + 2*lamda*theta1
    theta1 = theta1 - alpha*grad

print(theta1)

[[ 5.34685326]
 [ 0.18253667]
 [ 3.05649877]
 [-0.28724971]
 [ 1.88837892]
 [ 0.37228213]]

## your code here for the plot
#
xp = np.linspace(-1, 1, 1000)
yp1 = theta1[0,0] + theta1[1,0]*xp + theta1[2,0]*xp**2 + theta1[3,0]*xp**3 + theta1[4,0]*xp**4 + theta1[5,0]*xp**5

plt.figure(figsize = (6, 4))
plt.plot(X_values1, Y_values1, 'o', label = 'Lithium')
plt.plot(xp,yp1, 'r')
plt.xlabel('feature 0', fontsize = 12)
plt.ylabel('feature 1', fontsize = 12)
plt.xlim([-1.3, 1.3])
plt.ylim([-20, 13])
plt.grid(alpha = 0.3)
plt.show()

## your code here for non-linear regression algorithm on the aluminum battery data
#
X_values2 = Y[:, 0]
Y_values2 = Y[:, 1]
X_values2 = np.reshape(X_values2,(-1,1))
Y_values2 = np.reshape(Y_values2,(-1,1))

X_poly2 = np.hstack([np.ones([m,1]), X_values2, X_values2**2, X_values2**3, X_values2**4, X_values2**5])

m = np.shape(X_values2)[0]
y2 = np.asmatrix(Y_values2)
X_poly2 = np.asmatrix(X_poly2)
theta2 = np.ones([6, 1])

lamda = 1
alpha = 1e-3
n_iter = 100000

for i in range(n_iter):
    grad = 2*X_poly2.T*X_poly2*theta2 - 2*X_poly2.T*y2 + 2*lamda*theta2
    theta2 = theta2 - alpha*grad

print(theta2)

[[-1.64313707]
 [ 5.8345409 ]
 [-4.53444328]
 [ 0.74265421]
 [-3.75276524]
 [-0.22339565]]

## your code here for the plot
#
xp = np.linspace(-1, 1, 1000)
yp2 = theta2[0,0] + theta2[1,0]*xp + theta2[2,0]*xp**2 + theta2[3,0]*xp**3 + \
     theta2[4,0]*xp**4 + theta2[5,0]*xp**5

plt.figure(figsize = (6, 4))
plt.plot(X_values2, Y_values2, 'o', label = 'Aluminum')
plt.plot(xp,yp2, 'r')
plt.xlabel('feature 0', fontsize = 15)
plt.ylabel('feature 1', fontsize = 15)
plt.xlim([-1.3, 1.3])
plt.ylim([-20, 13])
plt.grid(alpha = 0.3)
plt.show()

## your code here for the aggregated plot
#
plt.figure(figsize = (6, 4))
plt.plot(X_values1, Y_values1, 'o', label = 'Lithium')
plt.plot(X_values2, Y_values2, 'o', label = 'Aluminum')
plt.plot(xp,yp1, 'r', linewidth = 3)
plt.plot(xp,yp2, 'g', linewidth = 3)
plt.xlabel('feature 0', fontsize = 12)
plt.ylabel('feature 1', fontsize = 12)
plt.xlim([-1.3, 1.3])
plt.ylim([-20, 13])
plt.grid(alpha = 0.3)
plt.legend(fontsize = 12)
plt.show()

Take home message: When you receive data for regression analysis, it's essential to have a thorough look at the data. Check whether the data can be separated into different classes. Regression algorithms on the entire data that consists of multiple classes will result in poor performance, while regression algorithms on each class of the data will result in good performance.

Problem 05¶

We would like to continue the above problem where Sam has a dataset (denoted as Z below) that consists of two types of battery data: lithium battery data (denoted as X below) and aluminum battery data (denoted as Y below) where $Z = X \cup Y$. However, let's consider a situation where Sam does not know the labels of each dataset. That means Sam does not know whether the given data belongs to lithium batteries or aluminum batteries. In other words, Sam is just given the data $Z$ without $X$ and $Y$ information. We will first use K-means to assign labels (class 0 & class 1) and employ supervised regression algorithms on each dataset.

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

X = np.random.uniform(low = -1, high = 1, size = (100, 2))
X[:, 1] = 5*X[:, 0]**2 + np.random.normal(loc = 0, scale = 1, size = 100) + 5

Y = np.random.uniform(low = -1, high = 1, size = (100, 2))
Y[:, 1] = -8*(Y[:, 0]-0.4)**2 + np.random.normal(loc = 0, scale = 1, size = 100)

Z = np.concatenate([X, Y])

# X: lithium battery data, Y: aluminum battery data, Z: battery dataset
# X.shape = (100,2)
# Y.shape = (100,2)
# Z.shape = (200,2)

Initially, Sam thought that building a single regression model that can fit into the dataset $Z$ would be appropriate, given that the dataset is all about batteries. Was Sam right about building a single regression model for the dataset? Use the K-Nearest Neighbor (KNN) regression algorithm. (Is the regression satisfactory?)

## your code here for regression algorithm
#
X_values = Z[:, 0]
Y_values = Z[:, 1]
X_values = np.reshape(X_values,(-1,1))
Y_values = np.reshape(Y_values,(-1,1))

from sklearn import neighbors

reg = neighbors.KNeighborsRegressor(n_neighbors = 30)
reg.fit(X_values, Y_values)

KNeighborsRegressor(n_neighbors=30)

KNeighborsRegressor(n_neighbors=30)

## your code here for the plot
#
xp = np.linspace(-1, 1, 100).reshape(-1, 1)
yp = reg.predict(xp)

plt.figure(figsize = (6, 4))
plt.title('k-Nearest Neighbor Regression', fontsize = 12)
plt.plot(X_values,Y_values, '.', label = 'Original Data')
plt.plot(xp, yp, label = 'kNN')
plt.legend(fontsize = 12)
plt.xlim([-1.3, 1.3])
plt.ylim([-20, 13])
plt.grid(alpha = 0.3)
plt.show()

Sam was not satisfied with the results above, so he decided to classify the dataset $Z$ into two separate groups by using K-means Clustering! Group the dataset $Z$ into two classes (class 0 : C0 & class 1 : C1) with K-means clustering. (You must NOT use $X$ and $Y$ information: This problem assumes the unsupervised situation)

## your code here for K-means clustering algorithm
#
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters = 2, random_state = 0)
kmeans.fit(Z)

/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  warnings.warn(

KMeans(n_clusters=2, random_state=0)

KMeans(n_clusters=2, random_state=0)

plt.figure(figsize = (6, 4))
plt.plot(Z[kmeans.labels_ == 0, 0], Z[kmeans.labels_ == 0, 1], 'g.', label = 'C0')
plt.plot(Z[kmeans.labels_ == 1, 0], Z[kmeans.labels_ == 1, 1], 'r.', label = 'C1')
plt.xlim([-1.3, 1.3])
plt.ylim([-20, 13])
plt.grid(alpha = 0.3)
plt.legend(fontsize = 12)
plt.show()

Now, Sam recognizes that it is necessary to build two separate regression algorithms for two separate classes of batteries. Using the K-Nearest Neighbor (KNN) regression algorithm, find and plot an approximate curve for each class of data. (Build two regression models for C0 and C1, separately).

## your code here for regression algorithm on the data with K-means label of C0
#
X_values1 = Z[kmeans.labels_ == 0, 0]
Y_values1 = Z[kmeans.labels_ == 0, 1]
X_values1 = np.reshape(X_values1,(-1, 1))
Y_values1 = np.reshape(Y_values1,(-1, 1))

from sklearn import neighbors

reg1 = neighbors.KNeighborsRegressor(n_neighbors = 8)
reg1.fit(X_values1, Y_values1)

KNeighborsRegressor(n_neighbors=8)

KNeighborsRegressor(n_neighbors=8)

## your code here for the plot
#
xp = np.linspace(-1, 1, 100).reshape(-1, 1)
yp = reg1.predict(xp)

plt.figure(figsize = (6, 4))
plt.title('k-Nearest Neighbor Regression 1', fontsize = 12)
plt.plot(X_values1,Y_values1, 'g.', label = 'C0 Data')
plt.plot(xp, yp, label = 'kNN')
plt.legend(fontsize = 12)
plt.xlim([-1.3, 1.3])
plt.ylim([-20, 13])
plt.grid(alpha = 0.3)
plt.show()

## your code here for the plot
#
X_values2 = Z[kmeans.labels_ == 1, 0]
Y_values2 = Z[kmeans.labels_ == 1, 1]
X_values2 = np.reshape(X_values2,(-1, 1))
Y_values2 = np.reshape(Y_values2,(-1, 1))

from sklearn import neighbors

reg2 = neighbors.KNeighborsRegressor(n_neighbors = 8)
reg2.fit(X_values2, Y_values2)

KNeighborsRegressor(n_neighbors=8)

KNeighborsRegressor(n_neighbors=8)

## your code here for the plot
#
xp = np.linspace(-1, 1, 100).reshape(-1, 1)
yp = reg2.predict(xp)

plt.figure(figsize = (6, 4))
plt.title('k-Nearest Neighbor Regression 2', fontsize = 12)
plt.plot(X_values2,Y_values2, 'r.', label = 'C1 Data')
plt.plot(xp, yp, label = 'kNN')
plt.legend(fontsize = 12)
plt.xlim([-1.3, 1.3])
plt.ylim([-20, 13])
plt.grid(alpha = 0.3)
plt.show()

Problem 6¶

In this problem we will refer to the binary classification task which we attempt to solve with the simple linear logistic regression model:

$$ P(y = +1 \mid x,\,\omega_1,\, \omega_2) = \sigma(\omega_1 x_1 + \omega_2 x_2) = \frac{1}{1+\exp(-\omega_1 x_1 - \omega_2 x_2)}$$

(For simplicity we do not use the bias parameter $\omega_0$ or think as $\omega_0 = 0$).

Download files: logistic_reg_x and logistic_reg_y

import numpy as np
import cvxpy as cvx
import matplotlib.pyplot as plt
%matplotlib inline

x = np.load("/content/drive/MyDrive/ML_Colab/ML_data/logistic_reg_x.npy")
y = np.load("/content/drive/MyDrive/ML_Colab/ML_data/logistic_reg_y.npy")

plt.figure(figsize = (6, 4))
plt.plot(x[y==1, 0], x[y==1, 1], 'or', label = 'Class:1')
plt.plot(x[y==0, 0], x[y==0, 1], 'ob', label = 'Class:0')
plt.legend(fontsize = 12)
plt.axis('equal')
plt.xlim([-5, 5])
plt.ylim([-5,5])
plt.grid(alpha = 0.3)
plt.show()

Plot a classification boundary by the logistic regression on the given data points. (Note that $\omega_0 = 0$)

$$ \max \sum_{i=1}^{m} \log P\left( y_i \mid x_i, \, \omega_1, \, \omega_2\right)$$

y = np.reshape(y, (-1,1))

x = np.asmatrix(x)
y = np.asmatrix(y)

w = cvx.Variable([2,1])
obj = cvx.Maximize(y.T@x@w - cvx.sum(cvx.logistic(x@w)))
prob = cvx.Problem(obj).solve()

w = w.value

xp = np.linspace(-6,6,100)
yp = -w[0,0]/w[1,0]*xp

x = np.array(x)
y = np.array(y)

plt.figure(figsize = (6, 4))
plt.plot(x[y[:,0]==1, 0], x[y[:,0]==1, 1], 'or', label = 'Class:1')
plt.plot(x[y[:,0]==0, 0], x[y[:,0]==0, 1], 'ob', label = 'Class:0')
plt.plot(xp,yp, 'k', label = 'Logistic Regression')
plt.legend(fontsize = 12)
plt.axis('equal')
plt.xlim([-5, 5])
plt.ylim([-5,5])
plt.grid(alpha = 0.3)
plt.show()

Consider a regularization approach where we want to minimize

$$ \sum_{i=1}^{m} - \log P\left( y_i \mid x_i, \, \omega_1, \, \omega_2\right) + \lambda \,\omega_2^2 $$

This is a logistic regression with regularization. Plot a classification boundary on the given data points. (Note that only $\omega_2$ is penalized. And set $\lambda = 100$) Then explain why the boundary has changed as a result of such regularization.

x = np.asmatrix(x)
y = np.asmatrix(y)

lamda = 100

w = cvx.Variable([2, 1])
obj = cvx.Minimize(- y.T@x@w + cvx.sum(cvx.logistic(x@w)) + lamda*w[1,0]**2)
prob = cvx.Problem(obj).solve()

w = w.value

xp = np.linspace(-6,6,100)
yp = -w[0,0]/w[1,0]*xp

x = np.array(x)
y = np.array(y)

plt.figure(figsize = (6, 4))
plt.plot(x[y[:,0]==1,0], x[y[:,0]==1,1], 'or', label='Class:1')
plt.plot(x[y[:,0]==0,0], x[y[:,0]==0,1], 'ob', label='Class:0')
plt.plot(xp,yp, 'k', label = 'Logistic Regression')
plt.legend(fontsize = 12)
plt.axis('equal')
plt.xlim([-5, 5])
plt.ylim([-5,5])
plt.grid(alpha = 0.3)
plt.show()

This regularization term penalizes $\omega_2^2$, which makes it as small as possible.

If we change the form of regularization to L1 norm

$$ \min \sum_{i=1}^{m} - \log P\left( y_i \mid x_i,\omega_1, \omega_2\right) + \lambda \,\left( \lvert \omega_1 \rvert + \lvert \omega_2 \rvert \right)$$

This is a logistic regression with regularization. Plot a classification boundary on the given data points. (Note that L1 norm of $\omega$ is penalized. And set $\lambda = 0.1$) Then explain why the boundary has changed as a result of such regularization.

x = np.asmatrix(x)
y = np.asmatrix(y)

lamda = 0.1

w = cvx.Variable([2, 1])
obj = cvx.Minimize(- y.T@x@w + cvx.sum(cvx.logistic(x@w)) + lamda*cvx.norm(w,1))
prob = cvx.Problem(obj).solve()

w = w.value

xp = np.linspace(-6,6,100)
yp = -w[0,0]/w[1,0]*xp

x = np.array(x)
y = np.array(y)

plt.figure(figsize = (6, 4))
plt.plot(x[y[:,0]==1,0], x[y[:,0]==1,1], 'or', label='Class:1')
plt.plot(x[y[:,0]==0,0], x[y[:,0]==0,1], 'ob', label='Class:0')
plt.plot(xp,yp, 'k', label ='Logistic Regression')
plt.legend(fontsize = 12)
plt.axis('equal')
plt.xlim([-5, 5])
plt.ylim([-5,5])
plt.grid(alpha = 0.3)
plt.show()

This regularization term penalizes $\omega$ in a sense of L1 norm, which makes it small and sparse at the same time. It results in $\omega_1 = 0$