Machine Learning for Mechanical Engineering

Regression


Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST
  • For your handwritten solutions, please scan or take a picture of them. Alternatively, you can write them in markdown if you prefer.

  • Only .ipynb files will be graded for your code.

    • Ensure that your NAME and student ID are included in your .ipynb files. ex) SeungchulLee_20241234_HW01.ipynb
  • Compress all the files into a single .zip file.

    • In the .zip file's name, include your NAME and student ID.

    ex) SeungchulLee_20241234_HW01.zip

    • Submit this .zip file on KLMS
  • Do not submit a printed version of your code, as it will not be graded.

Problem 1

Suppose you receive the binary signal. The signal is corrupted with noises while transmitting through channels. We want to get rid of noise from the original signal through the $L_1$ optimization. The mathematical problem statement is given:


$$ \begin{array}{Icr}\begin{align*} y = x + \omega\\ x \in \{ 0, 1 \}\\ \omega \text{ is noise} \end{align*}\end{array} \quad \Longrightarrow \begin{array}{I} \quad \text{Recover original signal } x \text{ from corrupted signal } y \end{array} $$

Step 1. Data generation

(a) First, generate the original signal of 200 data points as shown in the below figure. This can be simply done using ones and zeros commands in Python.

In [ ]:
# Write your code here
#

(b) Next, corrupt the original signal with Gaussian noise. This can be done using randn command in Python. One of realizations of the corrupted signal will be given:

In [ ]:
# Write your code here
#

Step 2. $L_1$ optimization

Note that the signal is sparse (in a sense of frequency domain). Therefore we can apply the $L_1$ optimization. We will optimize the $L_1$ loss function with the $L_2$ constraints.

(c) Plot the reconstructed signal. In addition, by changing $\beta$ in the lecture slide, explain how $\beta$ affects in an optimization process. One of reconstructed signal with a proper value of $\beta$ is shown below:

In [ ]:
# Write your code here
#

Problem 2

The regularized least-squares problem has the form


$$ \min_{\theta} \;\lVert A\theta -y\rVert_2^2 + \lambda \lVert \theta \rVert_2^2$$

(a) Show that the solution is given by

$$ \hat{\theta} = \left( A^T A + \lambda I_n \right)^{-1} A^T y $$

  • Do not use the method of Lagrangian multipliers



(b) Write down a gradient descent algorithm for the given optimization problem.


* Hint: Note that $$ \;\lVert A\theta -y\rVert_2^2 = (A\theta - y)^T(A\theta - y)$$
Then, you can differentiate the above equation to compute the gradient. Likewise, you can compute the gradient of the regularizer.

(c) Based on the result of (b), describe the role of regularizer term.

  • Hint: Gradient $g$ is computed by $ g = g_{\text{projection}} + g_{\text{regularizer}} $.

(d) Describe results of (a) and (b) have the same meaning.


(e) Find and draw an approximated curve of the given data points below in Python using your gradient descent algorithm.


In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# 10 data points
n = 10
x = np.linspace(-4.5, 4.5, 10).reshape(-1, 1)
y = np.array([0.9819, 0.7973, 1.9737, 0.1838, 1.3180, -0.8361, -0.6591, -2.4701, -2.8122, -6.2512]).reshape(-1, 1)

plt.figure(figsize = (10, 8))
plt.plot(x, y, 'o', label = 'Data')
plt.xlabel('X', fontsize = 15)
plt.ylabel('Y', fontsize = 15)
plt.grid(alpha = 0.3)
plt.show()
In [ ]:
# Write your code here
#
[[ 3.48274701e-01]
 [-2.58951123e+00]
 [-4.55286474e-01]
 [ 1.85022226e+00]
 [ 1.06250369e-01]
 [-4.43328786e-01]
 [-9.25753472e-03]
 [ 3.63088178e-02]
 [ 2.35143849e-04]
 [-9.24099978e-04]]

Problem 3: Image Panorama with Regression

We want to demonstrate an image panorama as an example of linear regression. A panorama is any wide-angle view or representation of a physical space.



In [ ]:
%%html
<center><iframe src="https://www.youtube.com/embed/86rnwu3ZFbE?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [ ]:
# import library
import numpy as np
import matplotlib.pyplot as plt
import cv2
In [ ]:
# load images
imag1 = cv2.imread('./data_files/1.jpg')
imag1 = cv2.cvtColor(imag1, cv2.COLOR_BGR2RGB)
imag2 = cv2.imread('./data_files/2.jpg')
imag2 = cv2.cvtColor(imag2, cv2.COLOR_BGR2RGB)
imag3 = cv2.imread('./data_files/3.jpg')
imag3 = cv2.cvtColor(imag3, cv2.COLOR_BGR2RGB)
In [ ]:
## your code here
#
In [ ]:
## your code here
#
In [ ]:
## your code here
#

Panorama

Here, we are explaining the basic concept of homography (i.e., perspective transformation).

  • Any wide-angle view or representation of a physical space

  • images with horizontally elongated fields of view

  • idea: projecting images onto a common plane



  • Camera rotating about its center

  • Two image planes are related by a homography $H$

Do not worry about a homography transformation. (out of this course's scope)


$$ \begin{bmatrix} x'\\y'\\1 \end{bmatrix} \sim \begin{bmatrix} \omega x'\\\omega y'\\\omega \end{bmatrix} = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \begin{bmatrix} x\\y\\1 \end{bmatrix} $$


  • For the advanced learner, watch the following online lecture by Prof. Aaron Bobick
In [ ]:
%%html
<center><iframe src="https://www.youtube.com/embed/pU4NorC7lb0?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

Find key points between two images

  • Suppose these matching points are given.
    • We have manually found the matching points for you, although there is a technique to do this automatically.
  • pos1 and pos2 are matching points between img01 and img02
  • pos3 and pos4 are matching points between img02 and img03
In [ ]:
pos1 = np.array([[2121, 2117, 2749, 3095, 3032, 3375, 3677, 3876],
                 [1431, 2034, 2033, 1885, 2017, 2037, 1885, 2279]], dtype=np.int64)
pos2 = np.array([[188, 58, 828, 1203, 1121, 1437, 1717, 1817],
                 [1217, 1909, 1952, 1827, 1952, 1991, 1870, 2226]], dtype=np.int64)
pos3 = np.array([[2338, 2379, 2658, 2899, 2977, 3272, 2716, 2786],
                 [1948, 1874, 2000, 1837, 1964, 1966, 2143, 2317]], dtype=np.int64)
pos4 = np.array([[109, 178, 497, 795, 851, 1144, 534, 580],
                 [1907, 1828, 1988, 1834, 1971, 1993, 2145, 2333]], dtype=np.int64)

(a) Visualization of key points

In [ ]:
## your code here
## Write down your own code to mark the key points (red dots) on the locations of the given images
#

Estimation of homography H

$$ X' = HX $$

where $ X $ and $X'$ are position vectors of key points, and $ H $ is a Perspective Transformation


Goal: we need to estimate homography $H$ via matching points between two images


$$\begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} \sim \begin{bmatrix} \omega x' \\ \omega y' \\ \omega \end{bmatrix} = \begin{bmatrix} \theta_{1} & \theta_{2} & \theta_{3} \\ \theta_{4} & \theta_{5} & \theta_{6} \\ \theta_{7} & \theta_{8} & 1 \end{bmatrix} \begin{bmatrix} x\\ y\\ 1 \end{bmatrix}$$
(b) Show the following equations from the above homography $H$



$$ \begin{align*} x' &= \frac{\theta_1 x+\theta_2 y+\theta_3}{\theta_7 x+\theta_8 y+1} \\ y' &= \frac{\theta_4 x+\theta_5 y+\theta_6}{\theta_7 x+\theta_8 y+1} \end{align*} $$


$$ \begin{align*} \theta_1 x+\theta_2 y+\theta_3 -\theta_7 x'x-\theta_8 x'y-x' &= 0 \\ \theta_4 x+\theta_5 y+\theta_6 -\theta_7 y'x-\theta_8 y'y-y' &= 0 \end{align*} $$
(c) For $m$ pairs of matching potins, show that a feature matrix $\Phi$ can be expressed as follows:
  • $ \Phi $ is a feature matrix

$$ \Phi = \begin{bmatrix} x_{1} & y_{1} & 1 & 0 & 0 & 0 & -x'_{1}x_{1} & -x'_{1}y_{1}\\ 0 & 0 & 0 & x_{1} & y_{1} & 1 & -y'_{1}x_{1} & -y'_{1}y_{1}\\ \vdots &\vdots &\vdots &\vdots &\vdots &\vdots &\vdots &\vdots\\ x_{m} & y_{m} & 1 & 0 & 0 & 0 & -x'_{m}x_{m} & -x'_{m}y_{m}\\ 0 & 0 & 0 & x_{m} & y_{m} & 1 & -y'_{m}x_{m} & -y'_{m}y_{m}\end{bmatrix} $$
  • $ \theta $ is a column vector for unknown parameters in a perspective transformation $H$

$$ \theta = \begin{bmatrix} \theta_{1} \\ \theta_{2} \\ \theta_{3} \\ \theta_{4} \\ \theta_{5} \\ \theta_{6} \\ \theta_{7} \\ \theta_{8} \end{bmatrix} $$
  • $b$ is a column vector for corresponding positions in the base image

$$ b = \begin{bmatrix} x'_{1} \\ y'_{1} \\ x'_{2} \\ y'_{2} \\ \vdots \\ x'_{m} \\ y'_{m} \end{bmatrix} $$
  • It ends up becoming a linear regression problem
$$ \min\limits_{\theta} \lVert \Phi\theta - b \rVert _2^2 $$$$ \theta^* = (\Phi^T\Phi)^{-1}\Phi^T b $$

(d) Perspective homography for image 1 and image 2

In [ ]:
## your code here
## Construct feature matrix using homography H, and a vector having entries of matching points in image 2




## your code here
## Define perspective_theta using linear regression




perspective_theta = np.vstack([perspective_theta, [1]])
perspective_theta = perspective_theta.reshape(3, 3)

(e) Perspective homography for image 2 and image 3

In [ ]:
## your code here
## Construct feature matrix using homography H, and a vector having entries of matching points in image 2




## your code here
## Define perspective_theta using linear regression




perspective_theta3 = np.vstack([perspective_theta3, [1]])
perspective_theta3 = perspective_theta3.reshape(3, 3)

Image warping

  • Again, do not worry about the image warping (outside lecture's scope)
In [ ]:
cv2.warpPerspective?
In [ ]:
## Apply image warping on image 1 & image 3 using cv2.warpPerspective function

## do translation to fit the warping image into a size of (18000, 6500) screen.
translation = np.matrix([[1, 0, 6000],
                         [0, 1, 2500],
                         [0, 0, 1]])

warpedImage = cv2.warpPerspective(imag1, translation*perspective_theta, (18000, 6500))
warpedImage3 = cv2.warpPerspective(imag3, translation*perspective_theta3, (18000, 6500))
In [ ]:
screen = warpedImage.copy()
screen[screen==0] = warpedImage3[screen==0]
screen[2500:3024+2500,6000:4032+6000] = imag2

## Visualize panorama image
plt.figure(figsize=(20, 12))
plt.imshow(screen)
plt.show()

Problem 4

We saw many optimization problems in a machine learning context and witnessed that $L_1$ norm reguarlization provides not only small, but also sparse decision variables. In this problem we will use a linear programming to solve $L_1$ norm optimization problems instead of simply using cvx.

(a) Show that


$$ \lvert y_i \rvert = \max \left(-y_i,y_i \right) $$

(b) Show that, by introducing slack variable $t_i$, the absolute value optimization problem can be coverted to


$$\min \,\lvert y_i \rvert = \min \left\{\max \left(-y_i,y_i \right) \right\} \quad \iff \quad \begin{align*} \min \quad & t_i \\ \text{subject to} \quad & y_i \leq t_i \\ - & y_i \leq t_i \end{align*} $$

(c) Suppose $A\theta-b = r$ and $t = \begin{bmatrix} t_1 & t_2 & \cdots & t_m \end{bmatrix}^T $, then show that $L_1$ norm fitting/approximation problem, i.e., sum of (absolute) residuals can be coverted to


$$\min \,\lVert A\theta-b \rVert_{1} \quad \iff \quad \begin{align*} \min \quad & t_1 + \cdots + t_m \\ \text{subject to} \quad & A\theta - b \leq t \\ - & (A\theta - b) \leq t \end{align*} $$

(d) Change the above optimization problem to a linear programming (LP) in matrix form where $\mathbb{1} = \begin{bmatrix} 1& \cdots & 1\end{bmatrix}^T$ and $\mathbb{0} = \begin{bmatrix} 0& \cdots & 0\end{bmatrix}^T$


$$ \begin{align*} \min \quad & \begin{bmatrix} \mathbb{0}^T & \mathbb{1}^T \end{bmatrix} \begin{bmatrix} \theta \\ t \end{bmatrix} \\ \text{subject to} \quad & \begin{bmatrix} A & -I \\ -A & -I \end{bmatrix}\begin{bmatrix} \theta \\ t \end{bmatrix} \leq \begin{bmatrix} b \\ -b \end{bmatrix} \end{align*} $$

(e) Use cvxpy in Python to find and plot a linear regression for data points with the above formulation


In [ ]:
import numpy as np
x = np.array([0.1, 0.4, 0.7, 1.2, 1.3, 1.7, 2.2, 2.8, 3.0, 4.0, 4.3, 4.4, 4.9])
y = np.array([0.5, 0.9, 1.1, 1.5, 1.5, 2.0, 2.2, 2.8, 2.7, 3.0, 3.5, 3.7, 3.9])
In [ ]:
## your code here
#

Problem 5

We have learned the total variation with the following image example.

Download data

In [ ]:
import cv2
import matplotlib.pyplot as plt
imbw = cv2.imread('./image_files/dog.jpg', 0)

row = 150
col = 150
resized_imbw = cv2.resize(imbw, (row, col))

plt.figure(figsize = (8,8))
plt.imshow(resized_imbw, 'gray')
plt.axis('off')
plt.show()

(a) Apply $L_1$ norm regularization to the image by solving


$$ \min\; \left\{ \lVert x-x_{cor}\rVert_2^2 + \mu \lVert Dx \rVert_1 \right\}$$
In [ ]:
n = row*col
imbws = resized_imbw.reshape(-1, 1)

mu = 100

## your code here
#

(b) However, the output image is not exactly what we want since the 2D image (or matrix) is flattened to make it an 1D column vector (or signal). We now consider 2D images. Apply a similar method like (a) by solving the following objective function.



$$ \text{minimize} \quad \sum_{i=1}^{n} \sum_{j=1}^{n} \left( x_{i,j} - x_{i,j}^{cor} \right)^2 + \mu \left[ \sum_{i=1}^{n-1} \sum_{j=1}^{n} \lvert x_{i+1,j}-x_{i,j} \rvert + \sum_{i=1}^{n}\sum_{j=1}^{n-1} \lvert x_{i,j+1}-x_{i,j} \rvert \right]$$

  • Hint: Use $\text{col} \times \text{row}$ Variable to solve this cvx problem
In [ ]:
mu = 1500

x = cvx.Variable([row,col])

## your code here
#

Problem 6

Consider the elastic-net regression problem:


$$\min \rVert y - A \theta \lVert_2^2 + \lambda \left[ \alpha \rVert \theta \lVert^2_2 + (1-\alpha)\rVert \theta \lVert_1 \right]$$

[6pt] Show how one can turn this into a Lasso problem, using an augmented version of $A$ and $y$.


Problem 7

Suppose we have a linear regression model with two weights and no bias term:


$$y = \omega_1 x_1 + \omega_2 x_2$$

and the usual loss function $\ell(y,\hat y) = \frac{1}{2}(y− \hat y)^2$ and cost $\mathcal{C}(\omega_1, \omega_2) = \frac{1}{m} \sum_{i}\ell(y^{(i)},\hat y ^{(i)})$. Suppose we have a training set consisting of $m = 3$ examples:

  • $x^{(1)} = [1,0]^T,\; y^{(1)} = 2$

  • $x^{(2)} = [0,1]^T,\; y^{(2)} = 2$

  • $x^{(3)} = [\sqrt{3},0]^T,\; y^{(3)} = 0$


Let's sketch one of the contours in weight space.


(a) Write the cost in the form below using $\ell(y,\hat y) = \frac{1}{2}(y− \hat y)^2$ , $\mathcal{C}(\omega_1, \omega_2) = \frac{1}{m} \sum_{i}\ell(y^{(i)},\hat y ^{(i)})$ and find $c_1,d_1,c_2,d_2,c_0$


$$\mathcal{C} = c_1 (\omega_1 - d_1)^2 + c_2 (\omega_2 - d_2)^2 + c_0$$

(b) Since $c_1, c_2 > 0$, this corresponds to an axis-aligned ellipse of $\omega_1$ and $\omega_2$. Sketch the ellipse by hands for $\mathcal{C} = 4$. Label the center and radii of the ellipse.

Problem 8

This problem is motivated by the response surface methodology. The response surface methodology (RSM) is a collection of statistical and mathematical techniques used for the purpose of

  • Setting up a series of experiments (design) for adequate predictions of a response y.

  • Fitting a hypothesized (empirical) model to data obtained under the chosen design.

  • Determining optimum conditions on the model's input (control) variables that lead to maximum or minimum response within a region of interest.


(a) Find and plot an approximated curve of the given data points below in Python using your non-linear regression algorithm. Use the polynomial basis functions with a degree of 9. Your curve should become non-convex function with several local minima. Do not worry about overfitting.


$$\hat y = \theta_0 + \theta_1 x + \cdots + \theta_9 x^9$$
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
'''
Given data x,y
'''
y = np.array([10, 8.5, 7, 8, 6.5, 3, 2, 4, 4.5, 3, 4, 5.2, 4.4, 7, 9])
x = np.linspace(-4.5, 4.5, y.shape[0])
'''
plt x and y
'''
plt.figure(figsize = (10, 8))
plt.plot(x, y, 'o', label = 'Data')
plt.xlabel('X', fontsize = 15)
plt.ylabel('Y', fontsize = 15)
plt.grid(alpha = 0.3)
plt.show()
In [ ]:
# Write your code here
'''
gradient descent method
Please solve only below #
do not modify except #

'''
m = np.shape(x)[0]
x = np.reshape(x,(-1,1))
y = np.reshape(y,(-1,1))

X = np.hstack([x**i for i in range(10)])
X = np.asmatrix(X)
y = np.asmatrix(y)

theta = np.ones([10,1])
## solve theta
theta =

print(theta)

(b) [6pt] Then we like to find the optimal point $x^*$ from the estimated response surface. Find the minimum point by the gradient descent algorithm in this non-convex problem. Remember that there might exist several local minima.



In [ ]:
def output(x, theta):
    y = theta[0,0] + theta[1,0]*x + theta[2,0]*x**2 + theta[3,0]*x**3 + \
     theta[4,0]*x**4 + theta[5,0]*x**5 + theta[6,0]*x**6 + theta[7,0]*x**7 + theta[8,0]*x**8 + theta[9,0]*x**9
    return y
In [ ]:
# Write your code here
#
'''
gradient descent method
Please solve only below #
do not modify except #

'''
alpha = 0.001
y = 20

for i in range(10):
    x = -4.5 + 9*np.random.rand()
    for i in range(1000):
        gradient = 0
        for i in range(9):
            #Update your gradeint using i and theta
            gradient  +=
        ## Update your x using alpha and gradient
        x =

    result = output(x, theta)
    if (result < y):
        y = result

print('y: ')
print(y)
y: 
2.502637535181355

Problem 9

Load lasso_input_data.pkl and lasso_target_data.pkl using the following command.

In [ ]:
from six.moves import cPickle
x = cPickle.load(open('./data_files/lasso_input_data.pkl', 'rb'))
y = cPickle.load(open('./data_files/lasso_target_data.pkl', 'rb'))

# x.shape
# y.shape

In this problem, you are asked to predict a target value $\hat y$ from 13 features (input data). Assume that your model has the following relationship ($\omega_0 = 0$):


$$\omega_1 x_1 + \omega_2 x_2 + \cdots + \omega_{13} x_{13} = \hat y$$

(a) Build the linear regression model using the Lasso regularizer.

(Hint : You can use cvxpy module to solve the above optimization problem)

In [ ]:
# Write your code here
#

(b) After applying Lasso, we can select meaningful features. Plot the magnitudes of all $\omega$'s and tell which features are important.

In [ ]:
# Write your code here
#

(c) Update the linear regresion model with the selected features only.

In [ ]:
# Write your code here
#