Instructor: Prof. Seungchul Lee

http://iailab.kaist.ac.kr/

Industrial AI Lab at KAIST

http://iailab.kaist.ac.kr/

Industrial AI Lab at KAIST

For your handwritten solution, scan or take a picture of them (you can write it in markdown if you want).

For your code, only .ipynb file will be graded.

- Please write your NAME and student ID on your .ipynb files. ex) IljeokKim_20202467_HW02.ipynb

Please compress all the files to make a single .zip file

- Please write your NAME and student ID on your .zip files. ex) DogyeomPark_20202467_HW02.zip
- Submit it to KLMS

Do not submit a printed version of your code. It will not be graded.

You will solve the following system of linear equations using some methods of the TensorFlow:

$$ \begin{align*} x + 2y - 3z = 4\\ x - y - z = 0\\ 2x + y - 2z = 6 \end{align*} $$

The objective of following questions is to find a vector $(x, y, z)$.

(0) Start with importing modules.

- The lectures are conducted using TensorFlow 1, but recently, TensorFlow 2 is used a lot, so you can choose between TensorFlow 1 and 2, and then solve the problems.
- If you run
`!pip install tensorflow`

, then TensorFlow 2 is installed. To run TensorFlow 1, execute the cell below.

In [2]:

```
import numpy as np
# if you want to use tensorflow 2
import tensorflow as tf
```

In [2]:

```
import numpy as np
# if you want to use tensorflow 1
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
```

(a) Use `tf.constant`

to find $x$.

In [3]:

```
A_unshaped = tf.constant([1, 2, -3, 1, -1, -1, 2, 1, -2], dtype = tf.float32)
b_unshaped = tf.constant([4, 0, 6], dtype = tf.float32)
```

In [4]:

```
# Your code here (TensorFlow 1)
A = tf.reshape( )
b = tf.reshape( )
A_inv = tf.linalg.inv(A)
x =
with tf.Session() as sess:
result =
print('x:\n', result)
```

In [4]:

```
# Your code here (TensorFlow 2)
A = tf.reshape( )
b = tf.reshape( )
A_inv = tf.linalg.inv(A)
x =
print('x:\n', result)
```

(b) You will design an objective function and optimize it to find $x$. Use `tf.Variable`

.

You can choose between tensorFlow 1 and 2, and then solve it

- hint:
- tensorFlow 1:
`tf.reduce_mean`

,`tf.math.square`

,`tf.train.GradientDescentOptimizer().minimize()`

- tensorFlow 2:
`tf.reduce_mean`

,`tf.math.square`

,`tf.keras.optimizers.SGD()`

- tensorFlow 1:

In [6]:

```
# Your code here (Tensorflow 1)
x = tf.Variable([[0], [0], [0]], dtype = tf.float32)
A = tf.reshape(A_unshaped, [3, 3])
b = tf.reshape(b_unshaped, [3, 1])
LearningRate = 0.1
cost =
optm =
init =
sess = tf.Session()
sess.run( )
for _ in range(500):
sess.run( )
print('x:\n', sess.run(x))
```

In [6]:

```
# Your code here (Tensorflow 2)
x = tf.Variable([[0], [0], [0]], dtype = tf.float32)
A = tf.reshape(A_unshaped, [3, 3])
b = tf.reshape(b_unshaped, [3, 1])
LearningRate = 0.1
optm =
for _ in range(500):
with tf.GradientTape() as tape:
tape.watch(x)
cost =
gradients = tape.gradient(cost, [x])
optm.apply_gradients(zip(gradients, [x]))
print('x:\n', x.numpy())
```

In this problem, we will try to classify handwritten digits. For the sake of simplicity, we simplify this digit classification problem into a binary classification between digit 0 and digit 1.

Data | Data description |
---|---|

0 | 1000 images (28×28 pixels) of handwritten digit 0 |

1 | 1000 images (28×28 pixels) of handwritten digit 1 |

- Download data to read the files in Python.

To read the files in Python, use the following code:

In [7]:

```
import matplotlib.pyplot as plt
from six.moves import cPickle
import numpy as np
data = cPickle.load(open('./data_files/data.pkl', 'rb'))
data0 = data['0']
data1 = data['1']
print('digit 0: ', data0.shape)
print('digit 1: ', data1.shape)
```

Here, we loaded 1000 numbers of data from ‘0’ and ‘1’. To display the image, use `plt.imshow(img)`

.

Note: make sure you are reading the files correctly. Check by displaying the few random images in each class.

Note: calculations are more manageable if you go though and convert each of the pixels in $28\times28$ matrix to a binary value first.

(1) Display the few random images in each class.

In [8]:

```
# Your code here
```

(2) Convert each of the pixels in 28×28 matrix to a binary value.

In [9]:

```
# Your code here
```

Now we must select ‘features’ from image data to detect digit 0 and digit 1. Two features are recommended

$\;\;$ (a) The total average pixels located at the center of the image (`img[10:20,10:20]`

).

$\;\;$ (b) The total average pixels over the entire image.

$\;\;$ (c) Include the ones as our bias term.

$$\Phi(x) = \begin{bmatrix} 1 \\ \text{feature1}\\ \text{feature2} \end{bmatrix}$$You should end up with a $2000\times3$ input matrix with the first $1000$ rows correspond to all of the ‘`data0`

’ and the second 1000 rows correspond to the two features for all of the given ‘`data1`

’.

In [10]:

```
# Your code here
```

Plot the data to see if classes are separable. The expected plot is the following:

In [11]:

```
# Your code here
```

In [12]:

```
from sklearn import linear_model
```

(1) Use the logistic regression algorithm to classify digit0 and digit1 data set.

$\;\;\;$ Note: you don't need to implement full algorithm by yourself. Just use sklearn library which we have learned in class.

$\;\;\;\;\;\;\;\;\;\;$ Use `LogisticRegression(solver = liblinear`

)

In [13]:

```
# Your code here
```

Out[13]:

(2) Plot the classifier (decision boundary).

$\;\;\;$ Note that the decision boundary is given:

$$\omega_1 x_1 + \omega_2x_2 + \omega_0 = 0$$

In [14]:

```
# Your code here
```

In this problem, we are going to understand probability in the logistic regression.

Note: you need to write down full codes for this problem by yourself. Do not just use sklearn library.

**Linear Regression vs. Logistic Regression**

In the case of categorical (0 or 1) data, linear regression is not possible to properly represent the relationship between input and output data. As shown in the figure on the right, logistic regression adds a non-linear characteristic (sigmoid function) to linear regression and fits the relationship between input and output data into a curve.

We can define the S-Curve as a probability by the characteristics of the sigmoid function.

In [15]:

```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
```

We are going to examine the relationship between basketball shooting accuracy and the number of shot. More specifically, We want a model that takes in "the number of shot" and spits out the probability that you will make the shot.

Here is the shape of $\omega$ and $x$.

$$ \begin{align*} \omega &= \begin{bmatrix} \omega_0 \\ \omega_1 \end{bmatrix}, \qquad x = \begin{bmatrix} 1 \\ x_1 \end{bmatrix}\\ \\ X &= \begin{bmatrix} \left(x^{(1)}\right)^T \\ \left(x^{(2)}\right)^T \\ \left(x^{(3)}\right)^T \\ \vdots\end{bmatrix} = \begin{bmatrix} 1 & x_1^{(1)} \\ 1 & x_1^{(2)} \\ 1 & x_1^{(3)} \\ \vdots & \vdots \\\end{bmatrix}, \qquad y = \begin{bmatrix} y^{(1)}\\ y^{(2)} \\y^{(3)} \\ \vdots \end{bmatrix} \end{align*} $$

(1) Generate the data set by the following description.

Data description | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Number of Shot (X) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 |

Made (Y) | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |

(2) After generating all data, plot them to check it.

In [16]:

```
# Your code here
```

(1) Define the sigmoid (logistic) function.

$$h_{\omega}(x) = h(x\,;\omega) = \sigma \left(\omega^T x\right) = \frac{1}{1+e^{-\omega^T x}}$$In [17]:

```
# Your code here
def h():
return
```

(2) [hand-written] Write the gradient of log likelihood.

(3) [hand-written] Write the gradient descent algorithm for this problem.

Note: this problem is a maximization problem.

(4) Initialize $\omega$ and update it by the gradient descent method.

In [18]:

```
# Your code here
```

(1) [hand-writtern] The below is the objective function of linear regression.

$$ \min_{\theta} ~ \lVert \hat y - y \rVert_2^2 = \min\limits_\theta {\lVert X\theta - y \rVert}_2^2$$

Write the gradient of least squares solution.

(2) Initialize $\theta$ and update it by the gradient descent method.

Note: this problem is a minimization problem.

In [19]:

```
# Your code here
```

(1) Define the prediction model and make predictions using the trained models

- Linear regression: $y = \theta_0 + \theta_1 X$
- Logistic regression: $y = \sigma(\omega_0 + \omega_1 X)$

In [20]:

```
# Your code here
def get_predictions(w, theta):
return np.array(logistic_predictions), np.array(linear_predictions)
```

(2) Plot the predicted values.

In [21]:

```
# Your code here
```

The cost function summarizes how well the model is behaving. In other words, we use the cost function to measure how close the model’s predictions are to the actual outputs.

**Log likelihood**
$$\ell(\omega) = \frac{1}{m} \log L(\omega) = \frac{1}{m}\sum\limits_{i=1}^m y^{(i)} \{\log(h_\omega (x^{(i)})) + (1-y^{(i)}) \log(1-h_\omega(x^{(i)}))\}
$$

Thus, we can define the cost for two cases separately:

$$ \text{cost}(h_{\theta}(x),y) = \begin{cases} -\log (h_{\theta}(x)) & \text{if } y = 1 \\ -\log (1-h_{\theta}(x)) & \text{if } y = 0 \end{cases} $$Which then results in:

Plot the cost function using the predicted values from logistic regression.

In [22]:

```
# Your code here
```