Deep Learning for Mechanical Engineering

Homework 03

Due Wednesday, 10/04/2023, 4:00 PM

Instructor: Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST
• For your handwritten solution,write it in markdown.

• Please compress all the files to make a single .zip file

• Submit it to KLMS
• Do not submit a printed version of your code. It will not be graded.

In this problem, we are going to solve the regression problem with batch and stochastic gradient descent methods. The objective of this problem is to compare the loss of each method. Since this problem is simple and convex, the loss would reduce rapidly. So, iterate no more than 100 times. And you do not have to use TensorFlow.

InΒ [1]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import sklearn

m = 1000
x = 2*np.random.rand(m, 1)
y = 4 + 3*x + 2*np.random.randn(m,1)
A = np.hstack((np.ones((m,1)), x))

x = np.asmatrix(x)
y = np.asmatrix(y)
A = np.asmatrix(A)

plt.figure(figsize = (8, 6))
plt.plot(x, y, '.')
plt.show()


(1) Find the fitted line with the batch gradient descent method and plot the loss.

Note: the loss function for the regression problem is MSE.

InΒ [2]:
## Solution

LR = 0.1
n_iter = 100

# initialization
theta = np.random.randn(2,1)
Loss = []

for i in range(n_iter):

plt.figure(figsize = (8, 6))
plt.plot(Loss)
plt.show()

InΒ [3]:
xp = np.arange(0, 2, 0.01).reshape(-1,1)
yp = theta[1,0]*xp + theta[0,0]

plt.figure(figsize = (8, 6))
plt.plot(x, y, '.')
plt.plot(xp, yp, 'k--')

plt.show()


(2) Find the fitted line with the stochastic gradient descent method and plot the loss.

InΒ [4]:
## Solution

LR = 0.1

# initialization
theta = np.random.randn(2,1)
Loss = []

for epoch in range(n_iter):

plt.figure(figsize = (8, 6))
plt.plot(Loss)
plt.show()

InΒ [5]:
xp = np.arange(0, 2, 0.01).reshape(-1,1)
yp = theta[1,0]*xp + theta[0,0]

plt.figure(figsize = (8, 6))
plt.plot(x, y, '.')
plt.plot(xp, yp, 'k--')

plt.show()


(3) In the stochastic gradient descent method, we will add a simple adaptive learning rate. We are going to reduce the learning rate gradually. In other words, use a large learning rate at first and decrease it gradually.

$$\alpha(k) = \frac{1}{k + 10}$$
InΒ [6]:
## Solution

def LearningRate(k):
return 1/(k + 10)

# initialization
theta = np.random.randn(2,1)
Loss = []

for epoch in range(n_iter):

plt.figure(figsize = (8, 6))
plt.plot(Loss)
# plt.ylim([0, 35])
plt.show()

InΒ [7]:
xp = np.arange(0, 2, 0.01).reshape(-1,1)
yp = theta[1,0]*xp + theta[0,0]

plt.figure(figsize = (8, 6))
plt.plot(x, y, '.')
plt.plot(xp, yp, 'k--')

plt.show()


# Problem 2: RegularizationΒΆ

The regularized least-squares problem has the form

$$\min_{\theta} \;\lVert A\theta -y\rVert_2^2 + \lambda \lVert \theta \rVert_2^2$$

(a) Show that the solution is given by

$$\hat{\theta} = \left( A^T A + \lambda I_n \right)^{-1} A^T y$$
• Do not use the method of Lagrangian multipliers

(b) Write down a gradient descent algorithm for the given optimization problem.

$$\min_{\theta} \;\lVert A\theta -y\rVert_2^2 + \lambda \lVert \theta \rVert_2^2$$

• Hint: Note that $$\;\lVert A\theta -y\rVert_2^2 = (A\theta - y)^T(A\theta - y)$$

Then, you can differentiate the above equation to compute the gradient. Likewise, you can compute the gradient of the regularizer.

(c) Based on the result of (b), describe the role of regularizer term.

• Hint: Gradient $g$ is computed by $g = g_{\text{projection}} + g_{\text{regularizer}}$.

(d) Describe results of (a) and (b) have the same meaning.