Deep Learning for Mechanical Engineering

Homework 02

Due Monday, 09/25/2023, 4:00 PM

Instructor: Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST
  • For your handwritten solution, scan or take a picture of them (you can write it in markdown if you want).

  • For your code, only .ipynb file will be graded.

    • Please write your NAME and student ID on your .ipynb files. ex) IljeokKim_20202467_HW02.ipynb
  • Please compress all the files to make a single .zip file

    • Please write your NAME and student ID on your .zip files. ex) DogyeomPark_20202467_HW02.zip
    • Submit it to KLMS
  • Do not submit a printed version of your code. It will not be graded.

Problem 1: Linear System with TensorFlow

You will solve the following system of linear equations using some methods of the TensorFlow:


$$ \begin{align*} x + 2y - 3z = 4\\ x - y - z = 0\\ 2x + y - 2z = 6 \end{align*} $$

The objective of following questions is to find a vector $(x, y, z)$.

(0) Start with importing modules.

  • The lectures are conducted using TensorFlow 1, but recently, TensorFlow 2 is used a lot, so you can choose between TensorFlow 1 and 2, and then solve the problems.
  • If you run !pip install tensorflow, then TensorFlow 2 is installed. To run TensorFlow 1, execute the cell below.
In [2]:
import numpy as np

# if you want to use tensorflow 2
import tensorflow as tf
In [2]:
import numpy as np

# if you want to use tensorflow 1
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

(a) Use tf.constant to find $x$.

In [3]:
A_unshaped = tf.constant([1, 2, -3, 1, -1, -1, 2, 1, -2], dtype = tf.float32)
b_unshaped = tf.constant([4, 0, 6], dtype = tf.float32)
In [4]:
# Your code here (TensorFlow 1)
A = tf.reshape( )
b = tf.reshape( )
A_inv = tf.linalg.inv(A)

x = 

with tf.Session() as sess:
    result = 
    print('x:\n', result)
x:
 [[3.]
 [2.]
 [1.]]
In [4]:
# Your code here (TensorFlow 2)
A = tf.reshape( )
b = tf.reshape( )
A_inv = tf.linalg.inv(A)

x = 

print('x:\n', result)
x:
 [[3.]
 [2.]
 [1.]]

(b) You will design an objective function and optimize it to find $x$. Use tf.Variable.

You can choose between tensorFlow 1 and 2, and then solve it

  • hint:
    • tensorFlow 1: tf.reduce_mean, tf.math.square, tf.train.GradientDescentOptimizer().minimize()
    • tensorFlow 2: tf.reduce_mean, tf.math.square, tf.keras.optimizers.SGD()
In [6]:
# Your code here (Tensorflow 1)

x = tf.Variable([[0], [0], [0]], dtype = tf.float32)
A = tf.reshape(A_unshaped, [3, 3])
b = tf.reshape(b_unshaped, [3, 1])
LearningRate = 0.1

cost = 
optm = 
init = 

sess = tf.Session()
sess.run( )

for _ in range(500):
    sess.run( )
    
print('x:\n', sess.run(x))
x:
 [[2.9999988 ]
 [1.9999994 ]
 [0.99999905]]
In [6]:
# Your code here (Tensorflow 2)

x = tf.Variable([[0], [0], [0]], dtype = tf.float32)
A = tf.reshape(A_unshaped, [3, 3])
b = tf.reshape(b_unshaped, [3, 1])
LearningRate = 0.1

optm = 

for _ in range(500):
    with tf.GradientTape() as tape:
        tape.watch(x)
        cost = 
    gradients = tape.gradient(cost, [x])
    optm.apply_gradients(zip(gradients, [x]))

print('x:\n', x.numpy())
x:
 [[2.9999988 ]
 [1.9999994 ]
 [0.99999905]]

Problem 2: Digit Classification with Scikit-Learn

In this problem, we will try to classify handwritten digits. For the sake of simplicity, we simplify this digit classification problem into a binary classification between digit 0 and digit 1.

Step 1. Load the Data

Data Data description
0 1000 images (28×28 pixels) of handwritten digit 0
1 1000 images (28×28 pixels) of handwritten digit 1

To read the files in Python, use the following code:

In [7]:
import matplotlib.pyplot as plt
from six.moves import cPickle
import numpy as np

data = cPickle.load(open('./data_files/data.pkl', 'rb'))
data0 = data['0']
data1 = data['1']

print('digit 0: ', data0.shape)
print('digit 1: ', data1.shape)
digit 0:  (1000, 28, 28)
digit 1:  (1000, 28, 28)

Here, we loaded 1000 numbers of data from ‘0’ and ‘1’. To display the image, use plt.imshow(img).

  • Note: make sure you are reading the files correctly. Check by displaying the few random images in each class.

  • Note: calculations are more manageable if you go though and convert each of the pixels in $28\times28$ matrix to a binary value first.

(1) Display the few random images in each class.

In [8]:
# Your code here

(2) Convert each of the pixels in 28×28 matrix to a binary value.

In [9]:
# Your code here

Step 2. Extract Features

Now we must select ‘features’ from image data to detect digit 0 and digit 1. Two features are recommended

$\;\;$ (a) The total average pixels located at the center of the image (img[10:20,10:20]).

$\;\;$ (b) The total average pixels over the entire image.

$\;\;$ (c) Include the ones as our bias term.

$$\Phi(x) = \begin{bmatrix} 1 \\ \text{feature1}\\ \text{feature2} \end{bmatrix}$$

You should end up with a $2000\times3$ input matrix with the first $1000$ rows correspond to all of the ‘data0’ and the second 1000 rows correspond to the two features for all of the given ‘data1’.

In [10]:
# Your code here
(2000, 3)

Step 3. Plot the Data

Plot the data to see if classes are separable. The expected plot is the following:



In [11]:
# Your code here

Step 4. Digit Classification with Logistic Regrssion

In [12]:
from sklearn import linear_model

(1) Use the logistic regression algorithm to classify digit0 and digit1 data set.

$\;\;\;$ Note: you don't need to implement full algorithm by yourself. Just use sklearn library which we have learned in class.

$\;\;\;\;\;\;\;\;\;\;$ Use LogisticRegression(solver = liblinear)

In [13]:
# Your code here
Out[13]:
LogisticRegression(solver='liblinear')

(2) Plot the classifier (decision boundary).

$\;\;\;$ Note that the decision boundary is given:


$$\omega_1 x_1 + \omega_2x_2 + \omega_0 = 0$$
In [14]:
# Your code here

Problem 3: Understanding Probability in Logistic Regression

In this problem, we are going to understand probability in the logistic regression.

Note: you need to write down full codes for this problem by yourself. Do not just use sklearn library.

Linear Regression vs. Logistic Regression

In the case of categorical (0 or 1) data, linear regression is not possible to properly represent the relationship between input and output data. As shown in the figure on the right, logistic regression adds a non-linear characteristic (sigmoid function) to linear regression and fits the relationship between input and output data into a curve.

We can define the S-Curve as a probability by the characteristics of the sigmoid function.



In [15]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Step 1. Data Generation

We are going to examine the relationship between basketball shooting accuracy and the number of shot. More specifically, We want a model that takes in "the number of shot" and spits out the probability that you will make the shot.

Here is the shape of $\omega$ and $x$.


$$ \begin{align*} \omega &= \begin{bmatrix} \omega_0 \\ \omega_1 \end{bmatrix}, \qquad x = \begin{bmatrix} 1 \\ x_1 \end{bmatrix}\\ \\ X &= \begin{bmatrix} \left(x^{(1)}\right)^T \\ \left(x^{(2)}\right)^T \\ \left(x^{(3)}\right)^T \\ \vdots\end{bmatrix} = \begin{bmatrix} 1 & x_1^{(1)} \\ 1 & x_1^{(2)} \\ 1 & x_1^{(3)} \\ \vdots & \vdots \\\end{bmatrix}, \qquad y = \begin{bmatrix} y^{(1)}\\ y^{(2)} \\y^{(3)} \\ \vdots \end{bmatrix} \end{align*} $$

(1) Generate the data set by the following description.

Data description
Number of Shot (X) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Made (Y) 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1

(2) After generating all data, plot them to check it.

In [16]:
# Your code here

Step 2. Logistic Regression with Gradient Descent

(1) Define the sigmoid (logistic) function.

$$h_{\omega}(x) = h(x\,;\omega) = \sigma \left(\omega^T x\right) = \frac{1}{1+e^{-\omega^T x}}$$
In [17]:
# Your code here

def h():
    return 

(2) [hand-written] Write the gradient of log likelihood.

(3) [hand-written] Write the gradient descent algorithm for this problem.

Note: this problem is a maximization problem.

(4) Initialize $\omega$ and update it by the gradient descent method.

In [18]:
# Your code here
[[-2.58701913]
 [ 0.19500517]]

Step 3. Linear Regression with Gradient Descent

(1) [hand-writtern] The below is the objective function of linear regression.


$$ \min_{\theta} ~ \lVert \hat y - y \rVert_2^2 = \min\limits_\theta {\lVert X\theta - y \rVert}_2^2$$

Write the gradient of least squares solution.

(2) Initialize $\theta$ and update it by the gradient descent method.

Note: this problem is a minimization problem.

In [19]:
# Your code here
[[0.02528796]
 [0.03492767]]

Step 4. Compare Linear Regression and Logistic Regression Models

(1) Define the prediction model and make predictions using the trained models

  • Linear regression: $y = \theta_0 + \theta_1 X$
  • Logistic regression: $y = \sigma(\omega_0 + \omega_1 X)$
In [20]:
# Your code here

def get_predictions(w, theta):
    
    return np.array(logistic_predictions), np.array(linear_predictions)

(2) Plot the predicted values.

In [21]:
# Your code here

Step 5. Cost Function of Logistic Regression

The cost function summarizes how well the model is behaving. In other words, we use the cost function to measure how close the model’s predictions are to the actual outputs.

Log likelihood $$\ell(\omega) = \frac{1}{m} \log L(\omega) = \frac{1}{m}\sum\limits_{i=1}^m y^{(i)} \{\log(h_\omega (x^{(i)})) + (1-y^{(i)}) \log(1-h_\omega(x^{(i)}))\} $$

Thus, we can define the cost for two cases separately:

$$ \text{cost}(h_{\theta}(x),y) = \begin{cases} -\log (h_{\theta}(x)) & \text{if } y = 1 \\ -\log (1-h_{\theta}(x)) & \text{if } y = 0 \end{cases} $$

Which then results in:

Plot the cost function using the predicted values from logistic regression.

In [22]:
# Your code here