Regression

Table of Contents

1. Linear Regression

Consider a linear regression.


$\text{Given} \; \begin{cases} x_{i} \; \text{: inputs} \\ y_{i} \; \text{: outputs} \end{cases}$ , Find $\theta_{0}$ and $\theta_{1}$


$$x= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{m} \end{bmatrix}, \qquad y= \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{m} \end{bmatrix} \approx \hat{y}_{i} = \theta_{0} + \theta_{1}x_{i}$$


  • $ \hat{y}_{i} $ : predicted output

  • $ \theta = \begin{bmatrix} \theta_{0} \\ \theta_{1} \\ \end{bmatrix} $ : Model parameters


$$ \hat{y}_{i} = f(x_{i}\,; \theta) \; \text{ in general}$$


  • in many cases, a linear model is used to predict $y_{i}$

$$ \hat{y}_{i} = \theta_{0} + \theta_{1}x_{i} \; \quad \text{ such that }\quad \min\limits_{\theta_{0}, \theta_{1}}\sum\limits_{i = 1}^{m} (\hat{y}_{i} - y_{i})^2$$



1.1. Re-cast Problem as a Least Squares

  • For convenience, we define a function that maps inputs to feature vectors, $\phi$

$$\begin{array}{Icr}\begin{align*} \hat{y}_{i} & = \theta_0 + x_i \theta_1 = 1 \cdot \theta_0 + x_i \theta_1 \\ \\ & = \begin{bmatrix}1 & x_{i}\end{bmatrix}\begin{bmatrix}\theta_{0} \\ \theta_{1}\end{bmatrix} \\\\ & =\begin{bmatrix}1 \\ x_{i} \end{bmatrix}^{T}\begin{bmatrix}\theta_{0} \\ \theta_{1}\end{bmatrix} \\\\ & =\phi^{T}(x_{i})\theta \end{align*}\end{array} \begin{array}{Icr} \quad \quad \text{feature vector} \; \phi(x_{i}) = \begin{bmatrix}1 \\ x_{i}\end{bmatrix} \end{array}$$


$$\Phi = \begin{bmatrix}1 & x_{1} \\ 1 & x_{2} \\ \vdots \\1 & x_{m} \end{bmatrix}=\begin{bmatrix}\phi^T(x_{1}) \\\phi^T(x_{2}) \\\vdots \\\phi^T(x_{m}) \end{bmatrix} \quad \implies \quad \hat{y} = \begin{bmatrix}\hat{y}_{1} \\\hat{y}_{2} \\\vdots \\\hat{y}_{m}\end{bmatrix}=\Phi\theta$$


  • Optimization problem

$$\min\limits_{\theta_{0}, \theta_{1}}\sum\limits_{i = 1}^{m} (\hat{y}_{i} - y_{i})^2 =\min\limits_{\theta}\lVert\Phi\theta-y\rVert^2_2 \qquad \qquad \left(\text{same as} \; \min_{x} \lVert Ax-b \rVert_2^2 \right)$$



$$ \text{solution} \; \theta^* = (\Phi^{T}\Phi)^{-1}\Phi^{T} y $$

Note




$$\begin{array}{Icr} \text{input} \\ x_{i} \end{array} \quad \rightarrow \quad \begin{array}{Icr} \text{feature} \\ \begin{bmatrix} 1 \\ x_{i} \end{bmatrix} \end{array} \quad \rightarrow \quad \begin{array}{Icr} \text{predicted output} \\ \hat{y}_{i} \end{array}$$


$$\begin{array}{Icr} \begin{bmatrix} 1 & x_{1} \\1 & x_{2}\\\vdots & \vdots\\1 & x_{m}\end{bmatrix}\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}=\begin{bmatrix}y_{1} \\y_{2} \\\vdots \\y_{m}\end{bmatrix} \\ \begin{array}{Icr} \uparrow \\ \vec{A}_1 \end{array} \;\; \begin{array}{Icr} \uparrow \\ \vec{A}_2 \end{array} \quad \begin{array}{Icr} \uparrow \\ \vec{x} \end{array} \quad\quad \;\; \begin{array}{Icr} \uparrow \\ \vec{B} \end{array} \end{array} \quad \begin{array}{Icr} \quad \text{over-determined or} \\ \quad \text{projection} \end{array}$$


$$A(= \Phi) = \left[ \vec{A}_1 \;\vec{A}_2 \right]$$

1.2. Solve Optimizaton in Linear Regression

1.2.1. Use Linear Algebra

  • known as least square

$$ \theta = (A^TA)^{-1}A^T y $$

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [ ]:
# data points in column vector [input, output]
x = np.array([0.1, 0.4, 0.7, 1.2, 1.3, 1.7, 2.2, 2.8, 3.0, 4.0, 4.3, 4.4, 4.9]).reshape(-1, 1)
y = np.array([0.5, 0.9, 1.1, 1.5, 1.5, 2.0, 2.2, 2.8, 2.7, 3.0, 3.5, 3.7, 3.9]).reshape(-1, 1)

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'ko')
plt.title('Data', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.xlim([0, 5])
plt.ylim([0, 4.5])
plt.show()
No description has been provided for this image
In [ ]:
m = y.shape[0]
#A = np.hstack([np.ones([m, 1]), x])
A = np.hstack([x**0, x])
A = np.asmatrix(A)

theta = (A.T*A).I*A.T*y

print('theta:\n', theta)
theta:
 [[0.65306531]
 [0.67129519]]
In [ ]:
# to plot
plt.figure(figsize = (6, 4))
plt.title('Regression', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.plot(x, y, 'ko', label = "data")

# to plot a straight line (fitted line)
xp = np.arange(0, 5, 0.01).reshape(-1, 1)
yp = theta[0,0] + theta[1,0]*xp

plt.plot(xp, yp, 'r', linewidth = 2, label = "regression")
plt.legend()
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.xlim([0, 5])
plt.ylim([0, 4.5])
plt.show()
No description has been provided for this image

1.2.2. Use Gradient Descent


$$\min_{\theta} ~ \lVert \hat y - y \rVert_2^2 = \min_{\theta} ~ \lVert A\theta - y \rVert_2^2$$


$$ \begin{align*} f &= (A\theta-y)^T(A\theta-y) = (\theta^TA^T-y^T)(A\theta-y) \\ &= \theta^TA^TA\theta - \theta^TA^Ty - y^TA\theta + y^Ty \\\\ \nabla f &= A^TA\theta + A^TA\theta - A^Ty - A^Ty = 2(A^TA\theta - A^Ty) \end{align*} $$


$$ \theta \leftarrow \theta - \alpha \nabla f $$

In [ ]:
theta = np.random.randn(2,1)
theta = np.asmatrix(theta)

alpha = 0.001

for _ in range(1000):
    df = 2*(A.T*A*theta - A.T*y)
    theta = theta - alpha*df

print (theta)
[[0.65351016]
 [0.67116077]]
In [ ]:
# to plot
plt.figure(figsize = (6, 4))
plt.title('Regression', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.plot(x, y, 'ko', label = "data")

# to plot a straight line (fitted line)
xp = np.arange(0, 5, 0.01).reshape(-1, 1)
yp = theta[0,0] + theta[1,0]*xp

plt.plot(xp, yp, 'r', linewidth = 2, label = "regression")
plt.legend()
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.xlim([0, 5])
plt.show()
No description has been provided for this image

1.2.3. Use CVXPY Optimization


$$\min_{\theta} ~ \lVert \hat y - y \rVert_2 = \min_{\theta} ~ \lVert A\theta - y \rVert_2$$


In [ ]:
import cvxpy as cvx

theta2 = cvx.Variable([2, 1])
obj = cvx.Minimize(cvx.norm(A@theta2 - y, 2))
cvx.Problem(obj,[]).solve()

print('theta:\n', theta2.value)
theta:
 [[0.65306531]
 [0.67129519]]

By the way, do we have to use only $L_2$ norm? No.

  • Let's use $L_1$ norm

$$ \min_{\theta} ~ \lVert \hat y - y \rVert_1 = \min_{\theta} ~ \lVert A\theta - y \rVert_1 $$


In [ ]:
theta1 = cvx.Variable([2, 1])
obj = cvx.Minimize(cvx.norm(A@theta1 - y, 1))
cvx.Problem(obj).solve()

print('theta:\n', theta1.value)
theta:
 [[0.6258404 ]
 [0.68539899]]
In [ ]:
# to plot data
plt.figure(figsize = (6, 4))
plt.title('$L_1$ and $L_2$ Regression', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.plot(x, y, 'ko', label = 'data')

# to plot straight lines (fitted lines)
xp = np.arange(0, 5, 0.01).reshape(-1, 1)
yp1 = theta1.value[0,0] + theta1.value[1,0]*xp
yp2 = theta2.value[0,0] + theta2.value[1,0]*xp

plt.plot(xp, yp1, 'b', linewidth = 2, label = '$L_1$')
plt.plot(xp, yp2, 'r', linewidth = 2, label = '$L_2$')
plt.legend()
plt.axis('equal')
plt.xlim([0, 5])
plt.grid(alpha = 0.3)
plt.show()
No description has been provided for this image

$L_1$ norm also provides a decent linear approximation.

What if outliers exist?

  • Fit with the different norms
  • Discuss the result
  • It is important to understand what makes them different.
In [ ]:
# add outliers
x = np.vstack([x, np.array([0.5, 3.8]).reshape(-1, 1)])
y = np.vstack([y, np.array([3.9, 0.3]).reshape(-1, 1)])

A = np.hstack([x**0, x])
A = np.asmatrix(A)
In [ ]:
A.shape
Out[ ]:
(15, 2)
In [ ]:
plt.figure(figsize = (6, 4))
plt.plot(x, y, 'ko', label = 'data')
plt.axis('equal')
plt.xlim([0, 5])
plt.grid(alpha = 0.3)
plt.show()
No description has been provided for this image
In [ ]:
theta2 = cvx.Variable([2, 1])
obj2 = cvx.Minimize(cvx.norm(A@theta2 - y, 2))
prob2 = cvx.Problem(obj2).solve()
In [ ]:
# to plot straight lines (fitted lines)
plt.figure(figsize = (6, 4))
plt.plot(x, y, 'ko', label = 'data')
xp = np.arange(0, 5, 0.01).reshape(-1,1)
yp2 = theta2.value[0,0] + theta2.value[1,0]*xp

plt.plot(xp, yp2, 'r', linewidth = 2, label = '$L_2$')
plt.axis('equal')
plt.xlim([0, 5])
plt.legend(loc = 5)
plt.grid(alpha = 0.3)
plt.show()
No description has been provided for this image
In [ ]:
theta1 = cvx.Variable([2, 1])
obj1 = cvx.Minimize(cvx.norm(A@theta1 - y, 1))
prob1 = cvx.Problem(obj1).solve()
In [ ]:
# to plot straight lines (fitted lines)
plt.figure(figsize = (6, 4))
plt.plot(x, y, 'ko', label = 'data')
xp = np.arange(0, 5, 0.01).reshape(-1,1)
yp1 = theta1.value[0,0] + theta1.value[1,0]*xp

plt.plot(xp, yp1, 'b', linewidth = 2, label = '$L_1$')
plt.axis('equal')
plt.xlim([0, 5])
plt.legend(loc = 5)
plt.grid(alpha = 0.3)
plt.show()
No description has been provided for this image
In [ ]:
# to plot data
plt.figure(figsize = (6, 4))
plt.plot(x, y, 'ko', label = 'data')
plt.title('$L_1$ and $L_2$ Regression w/ Outliers', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)

# to plot straight lines (fitted lines)
xp = np.arange(0, 5, 0.01).reshape(-1,1)
yp1 = theta1.value[0,0] + theta1.value[1,0]*xp
yp2 = theta2.value[0,0] + theta2.value[1,0]*xp

plt.plot(xp, yp1, 'b', linewidth = 2, label = '$L_1$')
plt.plot(xp, yp2, 'r', linewidth = 2, label = '$L_2$')
plt.axis('scaled')
plt.xlim([0, 5])
plt.legend(loc = 5)
plt.grid(alpha = 0.3)
plt.show()
No description has been provided for this image

Think about what makes them different.

1.2.4. Scikit-Learn


  • Machine Learning in Python
  • Simple and efficient tools for data mining and data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on NumPy, SciPy, and matplotlib
  • Open source, commercially usable - BSD license
  • https://scikit-learn.org/stable/index.html#
In [ ]:
x = np.array([0.1, 0.4, 0.7, 1.2, 1.3, 1.7, 2.2, 2.8, 3.0, 4.0, 4.3, 4.4, 4.9]).reshape(-1, 1)
y = np.array([0.5, 0.9, 1.1, 1.5, 1.5, 2.0, 2.2, 2.8, 2.7, 3.0, 3.5, 3.7, 3.9]).reshape(-1, 1)
In [ ]:
from sklearn import linear_model
In [ ]:
reg = linear_model.LinearRegression()
reg.fit(x, y)
Out[ ]:
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [ ]:
reg.coef_
Out[ ]:
array([[0.67129519]])
In [ ]:
reg.intercept_
Out[ ]:
array([0.65306531])
In [ ]:
# to plot
plt.figure(figsize = (6, 4))
plt.title('Regression', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.plot(x, y, 'ko', label = "data")

# to plot a straight line (fitted line)
plt.plot(xp, reg.predict(xp), 'r', linewidth = 2, label = "regression")
plt.legend(fontsize = 12)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.xlim([0, 5])
plt.show()
No description has been provided for this image

2. Multivariate Linear Regression

(= linear regression for multivariate data)


$$ \hat{y} = \theta_0 + \theta_{1}x_1 + \theta_{2}x_2 $$ $$$$ $$$$

$$\phi \left(x^{(i)}\right) = \begin{bmatrix}1\\x^{(i)}_{1}\\x^{(i)}_{2} \end{bmatrix}$$


In [ ]:
# for 3D plot
from mpl_toolkits.mplot3d import Axes3D
In [ ]:
# y = theta0 + theta1*x1 + theta2*x2 + noise

n = 200
x1 = np.random.randn(n, 1)
x2 = np.random.randn(n, 1)
noise = 0.5*np.random.randn(n, 1);

y = 2 + 1*x1 + 3*x2 + noise

fig = plt.figure(figsize = (8, 6))
ax = fig.add_subplot(1, 1, 1, projection = '3d')
ax.set_title('Generated Data', fontsize = 12)
ax.set_xlabel('$X_1$', fontsize = 12)
ax.set_ylabel('$X_2$', fontsize = 12)
ax.set_zlabel('Y', fontsize = 12)
ax.scatter(x1, x2, y, marker = '.', label = 'Data')
#ax.view_init(30,30)
plt.legend(fontsize = 12)
plt.show()