Instructor: Prof. Seungchul Lee

http://iailab.kaist.ac.kr/

Industrial AI Lab at KAIST

http://iailab.kaist.ac.kr/

Industrial AI Lab at KAIST

For your handwritten solution,write it in markdown.

For your code, only .ipynb file will be graded.

- Please write your NAME and student ID on your .ipynb files. ex) IljeokKim_20202467_HW02.ipynb

Please compress all the files to make a single .zip file

- Please write your NAME and student ID on your .zip files. ex) DogyeomPark_20202467_HW02.zip
- Submit it to KLMS

Do not submit a printed version of your code. It will not be graded.

In this problem, we are going to solve the regression problem with batch and stochastic gradient descent methods. The objective of this problem is to compare the loss of each method. Since this problem is simple and convex, the loss would reduce rapidly. So, iterate no more than 100 times. And you do not have to use TensorFlow.

InΒ [1]:

```
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import sklearn
m = 1000
x = 2*np.random.rand(m, 1)
y = 4 + 3*x + 2*np.random.randn(m,1)
A = np.hstack((np.ones((m,1)), x))
x = np.asmatrix(x)
y = np.asmatrix(y)
A = np.asmatrix(A)
plt.figure(figsize = (8, 6))
plt.plot(x, y, '.')
plt.show()
```

(1) Find the fitted line with the **batch gradient descent method** and plot the loss.

Note: the loss function for the regression problem is MSE.

InΒ [2]:

```
## Solution
LR = 0.1
n_iter = 100
# initialization
theta = np.random.randn(2,1)
Loss = []
# gradient descent
for i in range(n_iter):
## your code here
plt.figure(figsize = (8, 6))
plt.plot(Loss)
plt.title('Batch Gradient Descent')
plt.show()
```

InΒ [3]:

```
xp = np.arange(0, 2, 0.01).reshape(-1,1)
yp = theta[1,0]*xp + theta[0,0]
plt.figure(figsize = (8, 6))
plt.plot(x, y, '.')
plt.plot(xp, yp, 'k--')
plt.show()
```

(2) Find the fitted line with the **stochastic gradient descent method** and plot the loss.

InΒ [4]:

```
## Solution
LR = 0.1
# initialization
theta = np.random.randn(2,1)
Loss = []
for epoch in range(n_iter):
## your code here
plt.figure(figsize = (8, 6))
plt.plot(Loss)
plt.title('Stochastic Gradient Descent')
plt.show()
```

InΒ [5]:

```
xp = np.arange(0, 2, 0.01).reshape(-1,1)
yp = theta[1,0]*xp + theta[0,0]
plt.figure(figsize = (8, 6))
plt.plot(x, y, '.')
plt.plot(xp, yp, 'k--')
plt.show()
```

(3) In the stochastic gradient descent method, we will add a simple adaptive learning rate. We are going to reduce the learning rate gradually. In other words, use a large learning rate at first and decrease it gradually.

InΒ [6]:

```
## Solution
def LearningRate(k):
return 1/(k + 10)
# initialization
theta = np.random.randn(2,1)
Loss = []
for epoch in range(n_iter):
## your code here
plt.figure(figsize = (8, 6))
plt.plot(Loss)
plt.title('Adaptive Stochastic Gradient Descent')
# plt.ylim([0, 35])
plt.show()
```

InΒ [7]:

```
xp = np.arange(0, 2, 0.01).reshape(-1,1)
yp = theta[1,0]*xp + theta[0,0]
plt.figure(figsize = (8, 6))
plt.plot(x, y, '.')
plt.plot(xp, yp, 'k--')
plt.show()
```

The regularized least-squares problem has the form

(a) Show that the solution is given by

- Do not use the method of Lagrangian multipliers

(b) Write down a gradient descent algorithm for the given optimization problem.

- Hint: Note that $$ \;\lVert A\theta -y\rVert_2^2 = (A\theta - y)^T(A\theta - y)$$

Then, you can differentiate the above equation to compute the gradient. Likewise, you can compute the gradient of the regularizer.

(c) Based on the result of (b), describe the role of regularizer term.

- Hint: Gradient $g$ is computed by $ g = g_{\text{projection}} + g_{\text{regularizer}} $.

(d) Describe results of (a) and (b) have the same meaning.