For your handwritten solutions, please scan or take a picture of them. Alternatively, you can write them in markdown if you prefer.
Only .ipynb files will be graded for your code.
Compress all the files into a single .zip file.
Do not submit a printed version of your code, as it will not be graded.
You will solve the following system of linear equations using some methods of the TensorFlow:
The objective of following questions is to find a vector $(x, y, z)$.
(0) Start with importing modules.
!pip install tensorflow
, then TensorFlow 2 is installed. To run TensorFlow 1, execute the cell below.import numpy as np
# if you want to use tensorflow 2
import tensorflow as tf
import numpy as np
# if you want to use tensorflow 1
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
(a) Use tf.constant
to find $x$.
A_unshaped = tf.constant([1, 2, -3, 1, -1, -1, 2, 1, -2], dtype = tf.float32)
b_unshaped = tf.constant([4, 0, 6], dtype = tf.float32)
# Your code here (TensorFlow 1)
A = tf.reshape( )
b = tf.reshape( )
A_inv = tf.linalg.inv(A)
x =
with tf.Session() as sess:
result =
print('x:\n', result)
# Your code here (TensorFlow 2)
A = tf.reshape( )
b = tf.reshape( )
A_inv = tf.linalg.inv(A)
x =
print('x:\n', result)
(b) You will design an objective function and optimize it to find $x$. Use tf.Variable
.
You can choose between tensorFlow 1 and 2, and then solve it
tf.reduce_mean
, tf.math.square
, tf.train.GradientDescentOptimizer().minimize()
tf.reduce_mean
, tf.math.square
, tf.keras.optimizers.SGD()
# Your code here (Tensorflow 1)
x = tf.Variable([[0], [0], [0]], dtype = tf.float32)
A = tf.reshape(A_unshaped, [3, 3])
b = tf.reshape(b_unshaped, [3, 1])
LearningRate = 0.1
cost =
optm =
init =
sess = tf.Session()
sess.run( )
for _ in range(500):
sess.run( )
print('x:\n', sess.run(x))
# Your code here (Tensorflow 2)
x = tf.Variable([[0], [0], [0]], dtype = tf.float32)
A = tf.reshape(A_unshaped, [3, 3])
b = tf.reshape(b_unshaped, [3, 1])
LearningRate = 0.1
optm =
for _ in range(500):
with tf.GradientTape() as tape:
tape.watch(x)
cost =
gradients = tape.gradient(cost, [x])
optm.apply_gradients(zip(gradients, [x]))
print('x:\n', x.numpy())
In this problem, we will try to classify handwritten digits. For the sake of simplicity, we simplify this digit classification problem into a binary classification between digit 0 and digit 1.
Data | Data dexcription |
---|---|
0 | 1000 images (28×28 pixels) of handwritten digit 0 |
1 | 1000 images (28×28 pixels) of handwritten digit 1 |
To read the files in Python, use the following code:
from google.colab import drive
drive.mount('/content/drive')
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from six.moves import cPickle
## change the file path if needed
data = cPickle.load(open('/content/drive/MyDrive/ML_Colab/ML_data/data_zero_one.pkl', 'rb'))
data0 = data['0']
data1 = data['1']
Here, we will load 1,000 of data for ‘0’ and ‘1’, respectively. To display the image use plt.imshow(img)
.
Note: make sure you are reading the files correctly. Check by displaying the first few and the last few images in each class.
Note: calculations are more manageable if you go though and convert each of the pixels in $28\times28$ matrix to a binary value first.
Display a few random images in each class.
## you do not need to change anyting
plt.figure(figsize = (8, 4))
plt.subplot(2,4,1), plt.imshow(data0[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.subplot(2,4,2), plt.imshow(data0[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.subplot(2,4,3), plt.imshow(data0[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.subplot(2,4,4), plt.imshow(data0[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.subplot(2,4,5), plt.imshow(data1[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.subplot(2,4,6), plt.imshow(data1[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.subplot(2,4,7), plt.imshow(data1[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.subplot(2,4,8), plt.imshow(data1[np.random.randint(1000)], 'gray'), plt.axis('off')
plt.show()
Convert each of the pixels in $28×28$ matrix to a binary value.
## you do not need to change anyting
data0 = data0 > 125
data1 = data1 > 125
In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon being observed. Choosing informative, discriminating and independent features from raw data (or input) is a crucial step for effective algorithms in pattern recognition, classification and regression.
Now we must select the own ‘features’ from image data to detect digit 0 and digit 1. Two features are recommended as follows:
(a) Feature 1: the total average pixels located at the center of the image (img[10:20,10:20]
).
## your code here
#
(b) Feature 2: the total average pixels over the entire image.
## your code here
#
(c) Include the ones as the bias term.
You should end up with a $2000\times3$ input matrix with the first $1000$ rows correspond to all of the ‘data0
’ and the second 1000 rows correspond to all of the given ‘data1
’. This matrix is matrix $X$ (or $\Phi$) which we learned in a class.
## your code here
#
Plot the data to check if the classes are separable. The expected plot should look like this:
# Your code here
We would like to use the perceptron algorithm to classify digit 0 and digit 1 in the training set.
First initialize $\omega$. We usually set $\omega$ to a zero vector as an initial guess.
## your code here
#
We update $\omega$ when the prediction is wrong. The update rule is the following:
We will repeat the same iteration for the number of data points. Here’s a pseudo code:
For k = 1:100 {
For j = 1:100 {
i = random integer selection between 1 and 2000
compute yhat(i)
if yhat(i) is wrong {
w = w + y(i)x(i)
}
}
}
For every $k^{th}$ iteration, count and store how many predictions are wrong to see if your classifier converges.
## your code here
#
You are asked to plot two graphs. First, plot the number of wrong predictions with respect to every $k$'s iteration.
Second, plot the classifier (decision boundary). Note that the decision boundary is given:
## your code here
#
from sklearn import linear_model
(a) Use the logistic regression algorithm to classify digit0 and digit1 data set.
LogisticRegression(solver = liblinear
)# Your code here
(2) Plot the classifier (decision boundary).
# Your code here
In this problem, we will explore the concept of probability in logistic regression.
Note: You are required to write the full code for this problem yourself. Do not rely on the sklearn library.
Linear Regression vs. Logistic Regression
In the case of categorical (0 or 1) data, linear regression is not possible to properly represent the relationship between input and output data. As shown in the figure on the right, logistic regression adds a non-linear characteristic (sigmoid function) to linear regression and fits the relationship between input and output data into a curve.
We can define the S-Curve as a probability by the characteristics of the sigmoid function.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
We are going to examine the relationship between basketball shooting accuracy and the number of shot. More specifically, We want a model that takes in "the number of shot" and spits out the probability that you will make the shot.
Here is the shape of $\omega$ and $x$.
(1) Generate the data set by the following description.
Data description | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Number of Shot (X) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 |
Made (Y) | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
(2) After generating all data, plot them to check it.
# Your code here
(1) Define the sigmoid (logistic) function.
# Your code here
def h():
return
(2) [hand-written] Write the gradient of log likelihood.
(3) [hand-written] Write the gradient descent algorithm for this problem.
(4) Initialize $\omega$ and update it by the gradient descent method.
# Your code here
(1) [hand-writtern] The below is the objective function of linear regression.
Write the gradient of least squares solution.
(2) Initialize $\theta$ and update it by the gradient descent method.
# Your code here
(1) Define the prediction model and make predictions using the trained models
# Your code here
def get_predictions(w, theta):
return np.array(logistic_predictions), np.array(linear_predictions)
(2) Plot the predicted values.
# Your code here
The cost function summarizes how well the model is behaving. In other words, we use the cost function to measure how close the model’s predictions are to the actual outputs.
Log likelihood
Thus, we can define the cost for two cases separately:
Which then results in:
Plot the cost function using the predicted values from logistic regression.
# Your code here