Deep Learning for Mechanical Engineering

Problem Set 02


Instructor: Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST
  • For your handwritten solutions, please scan or take a picture of them. Alternatively, you can write them in markdown if you prefer.

  • Only .ipynb files will be graded for your code.

    • Ensure that your NAME and student ID are included in your .ipynb files. For example, 'SeungchulLee_20231234_HW01.ipynb'
  • Compress all the files into a single .zip file.

    • In the .zip file's name, include your NAME and student ID For example, 'SeungchulLee_20231234_HW01.zip'
    • Submit this .zip file on KLMS
  • Do not submit a printed version of your code, as it will not be graded.

Problem 1: Linear System with TensorFlow

You will solve the following system of linear equations using some methods of the TensorFlow:


$$ \begin{align*} x + 2y - 3z = 4\\ x - y - z = 0\\ 2x + y - 2z = 6 \end{align*} $$

The objective of following questions is to find a vector $(x, y, z)$.

(0) Start with importing modules.

  • The lectures are conducted using TensorFlow 1, but recently, TensorFlow 2 is used a lot, so you can choose between TensorFlow 1 and 2, and then solve the problems.
  • If you run !pip install tensorflow, then TensorFlow 2 is installed. To run TensorFlow 1, execute the cell below.
In [ ]:
import numpy as np

# if you want to use tensorflow 2
import tensorflow as tf
In [ ]:
import numpy as np

# if you want to use tensorflow 1
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

(a) Use tf.constant to find $x$.

In [ ]:
A_unshaped = tf.constant([1, 2, -3, 1, -1, -1, 2, 1, -2], dtype = tf.float32)
b_unshaped = tf.constant([4, 0, 6], dtype = tf.float32)
In [ ]:
# Your code here (TensorFlow 1)
A = tf.reshape( )
b = tf.reshape( )
A_inv = tf.linalg.inv(A)

x =

with tf.Session() as sess:
    result =
    print('x:\n', result)
x:
 [[3.]
 [2.]
 [1.]]
In [ ]:
# Your code here (TensorFlow 2)
A = tf.reshape( )
b = tf.reshape( )
A_inv = tf.linalg.inv(A)

x =

print('x:\n', result)
x:
 [[3.]
 [2.]
 [1.]]

(b) You will design an objective function and optimize it to find $x$. Use tf.Variable.

You can choose between tensorFlow 1 and 2, and then solve it

  • hint:
    • tensorFlow 1: tf.reduce_mean, tf.math.square, tf.train.GradientDescentOptimizer().minimize()
    • tensorFlow 2: tf.reduce_mean, tf.math.square, tf.keras.optimizers.SGD()
In [ ]:
# Your code here (Tensorflow 1)

x = tf.Variable([[0], [0], [0]], dtype = tf.float32)
A = tf.reshape(A_unshaped, [3, 3])
b = tf.reshape(b_unshaped, [3, 1])
LearningRate = 0.1

cost =
optm =
init =

sess = tf.Session()
sess.run( )

for _ in range(500):
    sess.run( )

print('x:\n', sess.run(x))
x:
 [[2.9999988 ]
 [1.9999994 ]
 [0.99999905]]
In [ ]:
# Your code here (Tensorflow 2)

x = tf.Variable([[0], [0], [0]], dtype = tf.float32)
A = tf.reshape(A_unshaped, [3, 3])
b = tf.reshape(b_unshaped, [3, 1])
LearningRate = 0.1

optm =

for _ in range(500):
    with tf.GradientTape() as tape:
        tape.watch(x)
        cost =
    gradients = tape.gradient(cost, [x])
    optm.apply_gradients(zip(gradients, [x]))

print('x:\n', x.numpy())
x:
 [[2.9999988 ]
 [1.9999994 ]
 [0.99999905]]

Problem 2: Digit Classification with Scikit-Learn

In this problem, we will try to classify handwritten digits. For the sake of simplicity, we simplify this digit classification problem into a binary classification between digit 0 and digit 1.



Step 1. Load the data


Data Data dexcription
0 1000 images (28×28 pixels) of handwritten digit 0
1 1000 images (28×28 pixels) of handwritten digit 1

Click to download data


To read the files in Python, use the following code:

In [ ]:
from google.colab import drive
drive.mount('/content/drive')
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from six.moves import cPickle

## change the file path if needed
data = cPickle.load(open('/content/drive/MyDrive/ML_Colab/ML_data/data_zero_one.pkl', 'rb'))
data0 = data['0']
data1 = data['1']

Here, we will load 1,000 of data for ‘0’ and ‘1’, respectively. To display the image use plt.imshow(img).

  • Note: make sure you are reading the files correctly. Check by displaying the first few and the last few images in each class.

  • Note: calculations are more manageable if you go though and convert each of the pixels in $28\times28$ matrix to a binary value first.

Display a few random images in each class.

In [ ]:
## you do not need to change anyting

plt.figure(figsize = (8, 4))
plt.subplot(2,4,1), plt.imshow(data0[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.subplot(2,4,2), plt.imshow(data0[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.subplot(2,4,3), plt.imshow(data0[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.subplot(2,4,4), plt.imshow(data0[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.subplot(2,4,5), plt.imshow(data1[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.subplot(2,4,6), plt.imshow(data1[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.subplot(2,4,7), plt.imshow(data1[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.subplot(2,4,8), plt.imshow(data1[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.show()

Convert each of the pixels in $28×28$ matrix to a binary value.

In [ ]:
## you do not need to change anyting

data0 = data0 > 125
data1 = data1 > 125

Step 2. Extract features

In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon being observed. Choosing informative, discriminating and independent features from raw data (or input) is a crucial step for effective algorithms in pattern recognition, classification and regression.



Now we must select the own ‘features’ from image data to detect digit 0 and digit 1. Two features are recommended as follows:

(a) Feature 1: the total average pixels located at the center of the image (img[10:20,10:20]).

In [ ]:
## your code here
#

(b) Feature 2: the total average pixels over the entire image.

In [ ]:
## your code here
#

(c) Include the ones as the bias term.


$$\phi(x) = \begin{bmatrix} 1 \\ \text{feature1}\\ \text{feature2} \end{bmatrix} \quad \implies \quad X \; (\text{or } \Phi) = \begin{bmatrix} \phi_1^T\\ \vdots \\ \phi_{1000}^T \\ \phi_{1001}^T \\ \vdots \\ \phi_{2000}^T \end{bmatrix}$$

You should end up with a $2000\times3$ input matrix with the first $1000$ rows correspond to all of the ‘data0’ and the second 1000 rows correspond to all of the given ‘data1’. This matrix is matrix $X$ (or $\Phi$) which we learned in a class.


In [ ]:
## your code here
#

Step 3. Plot the Data

Plot the data to check if the classes are separable. The expected plot should look like this:



In [ ]:
# Your code here

Problem 2-1: Digit classification with perceptron

We would like to use the perceptron algorithm to classify digit 0 and digit 1 in the training set.

Step 1. Initialization

First initialize $\omega$. We usually set $\omega$ to a zero vector as an initial guess.

In [ ]:
## your code here
#

Step 2. Update $\omega$

We update $\omega$ when the prediction is wrong. The update rule is the following:


$$ \omega \leftarrow \omega + y \cdot x$$

We will repeat the same iteration for the number of data points. Here’s a pseudo code:

For k = 1:100 {
      For j = 1:100 {
            i = random integer selection between 1 and 2000
            compute yhat(i)
            if yhat(i) is wrong {
                w = w + y(i)x(i)
            }
    }
}

For every $k^{th}$ iteration, count and store how many predictions are wrong to see if your classifier converges.

In [ ]:
## your code here
#

Step 3. Plot the result

You are asked to plot two graphs. First, plot the number of wrong predictions with respect to every $k$'s iteration.

Second, plot the classifier (decision boundary). Note that the decision boundary is given:


$$\omega_0 + \omega_1 x_1 + \omega_2x_2 = 0$$
In [ ]:
## your code here
#

Problem 2-2: Digit Classification with Logistic Regrssion

In [ ]:
from sklearn import linear_model

(a) Use the logistic regression algorithm to classify digit0 and digit1 data set.

  • Note: you don't need to implement full algorithm by yourself. Just use sklearn library which we have learned in class. Use LogisticRegression(solver = liblinear)
In [ ]:
# Your code here
Out[ ]:
LogisticRegression(solver='liblinear')

(2) Plot the classifier (decision boundary).

  • Note that the decision boundary is given:

$$\omega_0 + \omega_1 x_1 + \omega_2x_2 = 0$$
In [ ]:
# Your code here

Problem 3: Understanding Probability in Logistic Regression

In this problem, we will explore the concept of probability in logistic regression.

Note: You are required to write the full code for this problem yourself. Do not rely on the sklearn library.

Linear Regression vs. Logistic Regression

In the case of categorical (0 or 1) data, linear regression is not possible to properly represent the relationship between input and output data. As shown in the figure on the right, logistic regression adds a non-linear characteristic (sigmoid function) to linear regression and fits the relationship between input and output data into a curve.

We can define the S-Curve as a probability by the characteristics of the sigmoid function.



In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Step 1. Data Generation

We are going to examine the relationship between basketball shooting accuracy and the number of shot. More specifically, We want a model that takes in "the number of shot" and spits out the probability that you will make the shot.

Here is the shape of $\omega$ and $x$.


$$ \begin{align*} \omega &= \begin{bmatrix} \omega_0 \\ \omega_1 \end{bmatrix}, \qquad x = \begin{bmatrix} 1 \\ x_1 \end{bmatrix}\\ \\ X &= \begin{bmatrix} \left(x^{(1)}\right)^T \\ \left(x^{(2)}\right)^T \\ \left(x^{(3)}\right)^T \\ \vdots\end{bmatrix} = \begin{bmatrix} 1 & x_1^{(1)} \\ 1 & x_1^{(2)} \\ 1 & x_1^{(3)} \\ \vdots & \vdots \\\end{bmatrix}, \qquad y = \begin{bmatrix} y^{(1)}\\ y^{(2)} \\y^{(3)} \\ \vdots \end{bmatrix} \end{align*} $$

(1) Generate the data set by the following description.

Data description
Number of Shot (X) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Made (Y) 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1

(2) After generating all data, plot them to check it.

In [ ]:
# Your code here

Step 2. Logistic Regression with Gradient Descent

(1) Define the sigmoid (logistic) function.


$$h_{\omega}(x) = h(x\,;\omega) = \sigma \left(\omega^T x\right) = \frac{1}{1+e^{-\omega^T x}}$$
In [ ]:
# Your code here

def h():

    return

(2) [hand-written] Write the gradient of log likelihood.

(3) [hand-written] Write the gradient descent algorithm for this problem.

  • Note: this problem is a maximization problem.

(4) Initialize $\omega$ and update it by the gradient descent method.

In [ ]:
# Your code here
[[-2.58701913]
 [ 0.19500517]]

Step 3. Linear Regression with Gradient Descent

(1) [hand-writtern] The below is the objective function of linear regression.


$$ \min_{\theta} ~ \lVert \hat y - y \rVert_2^2 = \min\limits_\theta {\lVert X\theta - y \rVert}_2^2$$

Write the gradient of least squares solution.

(2) Initialize $\theta$ and update it by the gradient descent method.

  • Note: this problem is a minimization problem.
In [ ]:
# Your code here
[[0.02528796]
 [0.03492767]]

Step 4. Compare Linear Regression and Logistic Regression Models

(1) Define the prediction model and make predictions using the trained models

  • Linear regression: $y = \theta_0 + \theta_1 X$
  • Logistic regression: $y = \sigma(\omega_0 + \omega_1 X)$
In [ ]:
# Your code here

def get_predictions(w, theta):

    return np.array(logistic_predictions), np.array(linear_predictions)

(2) Plot the predicted values.

In [ ]:
# Your code here

Step 5. Cost Function of Logistic Regression

The cost function summarizes how well the model is behaving. In other words, we use the cost function to measure how close the model’s predictions are to the actual outputs.

Log likelihood


$$\ell(\omega) = \frac{1}{m} \log L(\omega) = \frac{1}{m}\sum\limits_{i=1}^m y^{(i)} \{\log(h_\omega (x^{(i)})) + (1-y^{(i)}) \log(1-h_\omega(x^{(i)}))\} $$

Thus, we can define the cost for two cases separately:


$$ \text{cost}(h_{\theta}(x),y) = \begin{cases} -\log (h_{\theta}(x)) & \text{if } y = 1 \\ -\log (1-h_{\theta}(x)) & \text{if } y = 0 \end{cases} $$

Which then results in:




Plot the cost function using the predicted values from logistic regression.

In [ ]:
# Your code here