Regression

Table of Contents

1. Linear Regression

Consider a linear regression.


$\text{Given} \; \begin{cases} x_{i} \; \text{: inputs} \\ y_{i} \; \text{: outputs} \end{cases}$ , Find $\theta_{0}$ and $\theta_{1}$


$$x= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{m} \end{bmatrix}, \qquad y= \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{m} \end{bmatrix} \approx \hat{y}_{i} = \theta_{0} + \theta_{1}x_{i}$$


  • $ \hat{y}_{i} $ : predicted output

  • $ \theta = \begin{bmatrix} \theta_{0} \\ \theta_{1} \\ \end{bmatrix} $ : Model parameters


$$ \hat{y}_{i} = f(x_{i}\,; \theta) \; \text{ in general}$$


  • in many cases, a linear model is used to predict $y_{i}$

$$ \hat{y}_{i} = \theta_{0} + \theta_{1}x_{i} \; \quad \text{ such that }\quad \min\limits_{\theta_{0}, \theta_{1}}\sum\limits_{i = 1}^{m} (\hat{y}_{i} - y_{i})^2$$



1.1. Re-cast Problem as a Least Squares

  • For convenience, we define a function that maps inputs to feature vectors, $\phi$

$$\begin{array}{Icr}\begin{align*} \hat{y}_{i} & = \theta_0 + x_i \theta_1 = 1 \cdot \theta_0 + x_i \theta_1 \\ \\ & = \begin{bmatrix}1 & x_{i}\end{bmatrix}\begin{bmatrix}\theta_{0} \\ \theta_{1}\end{bmatrix} \\\\ & =\begin{bmatrix}1 \\ x_{i} \end{bmatrix}^{T}\begin{bmatrix}\theta_{0} \\ \theta_{1}\end{bmatrix} \\\\ & =\phi^{T}(x_{i})\theta \end{align*}\end{array} \begin{array}{Icr} \quad \quad \text{feature vector} \; \phi(x_{i}) = \begin{bmatrix}1 \\ x_{i}\end{bmatrix} \end{array}$$


$$\Phi = \begin{bmatrix}1 & x_{1} \\ 1 & x_{2} \\ \vdots \\1 & x_{m} \end{bmatrix}=\begin{bmatrix}\phi^T(x_{1}) \\\phi^T(x_{2}) \\\vdots \\\phi^T(x_{m}) \end{bmatrix} \quad \implies \quad \hat{y} = \begin{bmatrix}\hat{y}_{1} \\\hat{y}_{2} \\\vdots \\\hat{y}_{m}\end{bmatrix}=\Phi\theta$$


  • Optimization problem

$$\min\limits_{\theta_{0}, \theta_{1}}\sum\limits_{i = 1}^{m} (\hat{y}_{i} - y_{i})^2 =\min\limits_{\theta}\lVert\Phi\theta-y\rVert^2_2 \qquad \qquad \left(\text{same as} \; \min_{x} \lVert Ax-b \rVert_2^2 \right)$$



$$ \text{solution} \; \theta^* = (\Phi^{T}\Phi)^{-1}\Phi^{T} y $$

Note




$$\begin{array}{Icr} \text{input} \\ x_{i} \end{array} \quad \rightarrow \quad \begin{array}{Icr} \text{feature} \\ \begin{bmatrix} 1 \\ x_{i} \end{bmatrix} \end{array} \quad \rightarrow \quad \begin{array}{Icr} \text{predicted output} \\ \hat{y}_{i} \end{array}$$


$$\begin{array}{Icr} \begin{bmatrix} 1 & x_{1} \\1 & x_{2}\\\vdots & \vdots\\1 & x_{m}\end{bmatrix}\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}=\begin{bmatrix}y_{1} \\y_{2} \\\vdots \\y_{m}\end{bmatrix} \\ \begin{array}{Icr} \uparrow \\ \vec{A}_1 \end{array} \;\; \begin{array}{Icr} \uparrow \\ \vec{A}_2 \end{array} \quad \begin{array}{Icr} \uparrow \\ \vec{x} \end{array} \quad\quad \;\; \begin{array}{Icr} \uparrow \\ \vec{B} \end{array} \end{array} \quad \begin{array}{Icr} \quad \text{over-determined or} \\ \quad \text{projection} \end{array}$$


$$A(= \Phi) = \left[ \vec{A}_1 \;\vec{A}_2 \right]$$

1.2. Solve Optimizaton in Linear Regression

1.2.1. Use Linear Algebra

  • known as least square

$$ \theta = (A^TA)^{-1}A^T y $$

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [ ]:
# data points in column vector [input, output]
x = np.array([0.1, 0.4, 0.7, 1.2, 1.3, 1.7, 2.2, 2.8, 3.0, 4.0, 4.3, 4.4, 4.9]).reshape(-1, 1)
y = np.array([0.5, 0.9, 1.1, 1.5, 1.5, 2.0, 2.2, 2.8, 2.7, 3.0, 3.5, 3.7, 3.9]).reshape(-1, 1)

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'ko')
plt.title('Data', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.xlim([0, 5])
plt.ylim([0, 4.5])
plt.show()
No description has been provided for this image
In [ ]:
m = y.shape[0]
#A = np.hstack([np.ones([m, 1]), x])
A = np.hstack([x**0, x])
A = np.asmatrix(A)

theta = (A.T*A).I*A.T*y

print('theta:\n', theta)
theta:
 [[0.65306531]
 [0.67129519]]
In [ ]:
# to plot
plt.figure(figsize = (6, 4))
plt.title('Regression', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.plot(x, y, 'ko', label = "data")

# to plot a straight line (fitted line)
xp = np.arange(0, 5, 0.01).reshape(-1, 1)
yp = theta[0,0] + theta[1,0]*xp

plt.plot(xp, yp, 'r', linewidth = 2, label = "regression")
plt.legend()
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.xlim([0, 5])
plt.ylim([0, 4.5])
plt.show()
No description has been provided for this image

1.2.2. Use Gradient Descent


$$\min_{\theta} ~ \lVert \hat y - y \rVert_2^2 = \min_{\theta} ~ \lVert A\theta - y \rVert_2^2$$


$$ \begin{align*} f &= (A\theta-y)^T(A\theta-y) = (\theta^TA^T-y^T)(A\theta-y) \\ &= \theta^TA^TA\theta - \theta^TA^Ty - y^TA\theta + y^Ty \\\\ \nabla f &= A^TA\theta + A^TA\theta - A^Ty - A^Ty = 2(A^TA\theta - A^Ty) \end{align*} $$


$$ \theta \leftarrow \theta - \alpha \nabla f $$

In [ ]:
theta = np.random.randn(2,1)
theta = np.asmatrix(theta)

alpha = 0.001

for _ in range(1000):
    df = 2*(A.T*A*theta - A.T*y)
    theta = theta - alpha*df

print (theta)
[[0.65351016]
 [0.67116077]]
In [ ]:
# to plot
plt.figure(figsize = (6, 4))
plt.title('Regression', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.plot(x, y, 'ko', label = "data")

# to plot a straight line (fitted line)
xp = np.arange(0, 5, 0.01).reshape(-1, 1)
yp = theta[0,0] + theta[1,0]*xp

plt.plot(xp, yp, 'r', linewidth = 2, label = "regression")
plt.legend()
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.xlim([0, 5])
plt.show()
No description has been provided for this image

1.2.3. Use CVXPY Optimization


$$\min_{\theta} ~ \lVert \hat y - y \rVert_2 = \min_{\theta} ~ \lVert A\theta - y \rVert_2$$


In [ ]:
import cvxpy as cvx

theta2 = cvx.Variable([2, 1])
obj = cvx.Minimize(cvx.norm(A@theta2 - y, 2))
cvx.Problem(obj,[]).solve()

print('theta:\n', theta2.value)
theta:
 [[0.65306531]
 [0.67129519]]

By the way, do we have to use only $L_2$ norm? No.

  • Let's use $L_1$ norm

$$ \min_{\theta} ~ \lVert \hat y - y \rVert_1 = \min_{\theta} ~ \lVert A\theta - y \rVert_1 $$


In [ ]:
theta1 = cvx.Variable([2, 1])
obj = cvx.Minimize(cvx.norm(A@theta1 - y, 1))
cvx.Problem(obj).solve()

print('theta:\n', theta1.value)
theta:
 [[0.6258404 ]
 [0.68539899]]
In [ ]:
# to plot data
plt.figure(figsize = (6, 4))
plt.title('$L_1$ and $L_2$ Regression', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.plot(x, y, 'ko', label = 'data')

# to plot straight lines (fitted lines)
xp = np.arange(0, 5, 0.01).reshape(-1, 1)
yp1 = theta1.value[0,0] + theta1.value[1,0]*xp
yp2 = theta2.value[0,0] + theta2.value[1,0]*xp

plt.plot(xp, yp1, 'b', linewidth = 2, label = '$L_1$')
plt.plot(xp, yp2, 'r', linewidth = 2, label = '$L_2$')
plt.legend()
plt.axis('equal')
plt.xlim([0, 5])
plt.grid(alpha = 0.3)
plt.show()
No description has been provided for this image

$L_1$ norm also provides a decent linear approximation.

What if outliers exist?

  • Fit with the different norms
  • Discuss the result
  • It is important to understand what makes them different.
In [ ]:
# add outliers
x = np.vstack([x, np.array([0.5, 3.8]).reshape(-1, 1)])
y = np.vstack([y, np.array([3.9, 0.3]).reshape(-1, 1)])

A = np.hstack([x**0, x])
A = np.asmatrix(A)
In [ ]:
A.shape
Out[ ]:
(15, 2)
In [ ]:
plt.figure(figsize = (6, 4))
plt.plot(x, y, 'ko', label = 'data')
plt.axis('equal')
plt.xlim([0, 5])
plt.grid(alpha = 0.3)
plt.show()
No description has been provided for this image
In [ ]:
theta2 = cvx.Variable([2, 1])
obj2 = cvx.Minimize(cvx.norm(A@theta2 - y, 2))
prob2 = cvx.Problem(obj2).solve()
In [ ]:
# to plot straight lines (fitted lines)
plt.figure(figsize = (6, 4))
plt.plot(x, y, 'ko', label = 'data')
xp = np.arange(0, 5, 0.01).reshape(-1,1)
yp2 = theta2.value[0,0] + theta2.value[1,0]*xp

plt.plot(xp, yp2, 'r', linewidth = 2, label = '$L_2$')
plt.axis('equal')
plt.xlim([0, 5])
plt.legend(loc = 5)
plt.grid(alpha = 0.3)
plt.show()
No description has been provided for this image
In [ ]:
theta1 = cvx.Variable([2, 1])
obj1 = cvx.Minimize(cvx.norm(A@theta1 - y, 1))
prob1 = cvx.Problem(obj1).solve()
In [ ]:
# to plot straight lines (fitted lines)
plt.figure(figsize = (6, 4))
plt.plot(x, y, 'ko', label = 'data')
xp = np.arange(0, 5, 0.01).reshape(-1,1)
yp1 = theta1.value[0,0] + theta1.value[1,0]*xp

plt.plot(xp, yp1, 'b', linewidth = 2, label = '$L_1$')
plt.axis('equal')
plt.xlim([0, 5])
plt.legend(loc = 5)
plt.grid(alpha = 0.3)
plt.show()
No description has been provided for this image
In [ ]:
# to plot data
plt.figure(figsize = (6, 4))
plt.plot(x, y, 'ko', label = 'data')
plt.title('$L_1$ and $L_2$ Regression w/ Outliers', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)

# to plot straight lines (fitted lines)
xp = np.arange(0, 5, 0.01).reshape(-1,1)
yp1 = theta1.value[0,0] + theta1.value[1,0]*xp
yp2 = theta2.value[0,0] + theta2.value[1,0]*xp

plt.plot(xp, yp1, 'b', linewidth = 2, label = '$L_1$')
plt.plot(xp, yp2, 'r', linewidth = 2, label = '$L_2$')
plt.axis('scaled')
plt.xlim([0, 5])
plt.legend(loc = 5)
plt.grid(alpha = 0.3)
plt.show()
No description has been provided for this image

Think about what makes them different.

1.2.4. Scikit-Learn


  • Machine Learning in Python
  • Simple and efficient tools for data mining and data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on NumPy, SciPy, and matplotlib
  • Open source, commercially usable - BSD license
  • https://scikit-learn.org/stable/index.html#
In [ ]:
x = np.array([0.1, 0.4, 0.7, 1.2, 1.3, 1.7, 2.2, 2.8, 3.0, 4.0, 4.3, 4.4, 4.9]).reshape(-1, 1)
y = np.array([0.5, 0.9, 1.1, 1.5, 1.5, 2.0, 2.2, 2.8, 2.7, 3.0, 3.5, 3.7, 3.9]).reshape(-1, 1)
In [ ]:
from sklearn import linear_model
In [ ]:
reg = linear_model.LinearRegression()
reg.fit(x, y)
Out[ ]:
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [ ]:
reg.coef_
Out[ ]:
array([[0.67129519]])
In [ ]:
reg.intercept_
Out[ ]:
array([0.65306531])
In [ ]:
# to plot
plt.figure(figsize = (6, 4))
plt.title('Regression', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.plot(x, y, 'ko', label = "data")

# to plot a straight line (fitted line)
plt.plot(xp, reg.predict(xp), 'r', linewidth = 2, label = "regression")
plt.legend(fontsize = 12)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.xlim([0, 5])
plt.show()
No description has been provided for this image

2. Multivariate Linear Regression

(= linear regression for multivariate data)


$$ \hat{y} = \theta_0 + \theta_{1}x_1 + \theta_{2}x_2 $$ $$$$ $$$$

$$\phi \left(x^{(i)}\right) = \begin{bmatrix}1\\x^{(i)}_{1}\\x^{(i)}_{2} \end{bmatrix}$$


In [ ]:
# for 3D plot
from mpl_toolkits.mplot3d import Axes3D
In [ ]:
# y = theta0 + theta1*x1 + theta2*x2 + noise

n = 200
x1 = np.random.randn(n, 1)
x2 = np.random.randn(n, 1)
noise = 0.5*np.random.randn(n, 1);

y = 2 + 1*x1 + 3*x2 + noise

fig = plt.figure(figsize = (8, 6))
ax = fig.add_subplot(1, 1, 1, projection = '3d')
ax.set_title('Generated Data', fontsize = 12)
ax.set_xlabel('$X_1$', fontsize = 12)
ax.set_ylabel('$X_2$', fontsize = 12)
ax.set_zlabel('Y', fontsize = 12)
ax.scatter(x1, x2, y, marker = '.', label = 'Data')
#ax.view_init(30,30)
plt.legend(fontsize = 12)
plt.show()
No description has been provided for this image



$$\Phi = \begin{bmatrix}1 & x_{1}^{(1)} & x_{2}^{(1)}\\1 & x_{1}^{(2)} & x_{2}^{(2)}\\ \vdots \\1 & x_{1}^{(m)} & x_{2}^{(m)} \end{bmatrix} \quad \implies \quad \hat{y} = \begin{bmatrix}\hat{y}^{(1)} \\\hat{y}^{(2)} \\\vdots \\\hat{y}^{(m)}\end{bmatrix}=\Phi\theta$$



$$\implies \theta^{*} = (\Phi^T \Phi)^{-1} \Phi^T y$$


In [ ]:
A = np.hstack([np.ones((n, 1)), x1, x2])
A = np.asmatrix(A)
theta = (A.T*A).I*A.T*y

X1, X2 = np.meshgrid(np.arange(np.min(x1), np.max(x1), 0.5),
                     np.arange(np.min(x2), np.max(x2), 0.5))
YP = theta[0,0] + theta[1,0]*X1 + theta[2,0]*X2

fig = plt.figure(figsize = (8, 6))
ax = fig.add_subplot(1, 1, 1, projection = '3d')
ax.set_title('Regression', fontsize = 12)
ax.set_xlabel('$X_1$', fontsize = 12)
ax.set_ylabel('$X_2$', fontsize = 12)
ax.set_zlabel('Y', fontsize = 12)
ax.scatter(x1, x2, y, marker = '.', label = 'Data')
ax.plot_wireframe(X1, X2, YP, color = 'k', alpha = 0.3, label = 'Regression Plane')
#ax.view_init(30,30)
plt.legend(fontsize = 12)
plt.show()
No description has been provided for this image

3. Nonlinear Regression

(= Linear Regression for Non-linear Data)


  • Same as linear regression, just with non-linear features

  • Method 1: constructing explicit feature vectors

    • Polynomial features
    • Radial basis function (RBF) features
  • Method 2: implicit feature vectors, kernel trick (optional)

3.1. Nonlinear Regression with Polynomial Features

(here, quad is used as an example)


$$ \begin{align*} y &= \theta_0 + \theta_1 x + \theta_2 x^2 + \text{noise} \end{align*} $$



$$\phi(x_{i}) = \begin{bmatrix}1\\x_{i}\\x_{i}^2 \end{bmatrix}$$

$$\Phi = \begin{bmatrix}1 & x_{1} & x_{1}^2 \\ 1 & x_{2} & x_{2}^2 \\ \vdots \\ 1 & x_{m} & x_{m}^2\end{bmatrix} \quad \implies \quad \hat{y} = \begin{bmatrix}\hat{y}_1 \\\hat{y}_2 \\\vdots \\\hat{y}_m\end{bmatrix}=\Phi\theta$$



$$\implies \theta^{*} = (\Phi^T \Phi)^{-1} \Phi^T y$$

In [ ]:
# y = theta0 + theta1*x + theta2*x^2 + noise

n = 100
x = -5 + 15*np.random.rand(n, 1)
noise = 10*np.random.randn(n, 1)

y = 10 + 1*x + 2*x**2 + noise

plt.figure(figsize = (6, 4))
plt.title('True x and y', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.plot(x, y, 'o', markersize = 4, label = 'actual')
plt.xlim([np.min(x), np.max(x)])
plt.grid(alpha = 0.3)
plt.legend(fontsize = 12)
plt.show()
No description has been provided for this image
In [ ]:
A = np.hstack([x**0, x, x**2])
A = np.asmatrix(A)

theta = (A.T*A).I*A.T*y
print('theta:\n', theta)
theta:
 [[10.77438394]
 [ 0.3957224 ]
 [ 2.04054824]]
In [ ]:
xp = np.linspace(np.min(x), np.max(x))
yp = theta[0,0] + theta[1,0]*xp + theta[2,0]*xp**2

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', markersize = 4, label = 'actual')
plt.plot(xp, yp, 'r', linewidth = 2, label = 'estimated')

plt.title('Nonlinear regression', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.xlim([np.min(x), np.max(x)])
plt.grid(alpha = 0.3)
plt.legend(fontsize = 12)
plt.show()
No description has been provided for this image

4. Overfitting

This is a very important code that you might want to fully understand or even memorize

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# 10 data points
n = 10
x = np.linspace(-4.5, 4.5, 10).reshape(-1, 1)
y = np.array([0.9819, 0.7973, 1.9737, 0.1838, 1.3180, -0.8361, -0.6591, -2.4701, -2.8122, -6.2512]).reshape(-1, 1)

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', label = 'Data')
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.grid(alpha = 0.3)
plt.axis('equal')
plt.show()
No description has been provided for this image
In [ ]:
A = np.hstack([x**0, x])
A = np.asmatrix(A)

theta = (A.T*A).I*A.T*y
print(theta)
[[-0.7774    ]
 [-0.71070424]]
In [ ]:
# to plot
xp = np.arange(-4.5, 4.5, 0.01).reshape(-1, 1)
yp = theta[0,0] + theta[1,0]*xp

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', label = 'Data')
plt.plot(xp[:,0], yp[:,0], linewidth = 2, label = 'Linear')
plt.title('Linear Regression', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.legend(fontsize = 12)
plt.grid(alpha = 0.3)
plt.axis('equal')
plt.show()
No description has been provided for this image
In [ ]:
A = np.hstack([x**0, x, x**2])
A = np.asmatrix(A)

theta = (A.T*A).I*A.T*y
print(theta)
[[ 0.33669062]
 [-0.71070424]
 [-0.13504129]]
In [ ]:
# to plot
xp = np.arange(-4.5, 4.5, 0.01).reshape(-1, 1)
yp = theta[0,0] + theta[1,0]*xp + theta[2,0]*xp**2

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', label = 'Data')
plt.plot(xp[:,0], yp[:,0], linewidth = 2, label = '2nd degree')
plt.title('Nonlinear Regression with Polynomial Functions', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.legend(fontsize = 12)
plt.grid(alpha = 0.3)
plt.axis('equal')
plt.show()
No description has been provided for this image
In [ ]:
A = np.hstack([x**i for i in range(10)])
A = np.asmatrix(A)

theta = (A.T*A).I*A.T*y
print(theta)

# to plot
xp = np.arange(-4.5, 4.5, 0.01).reshape(-1, 1)

polybasis = np.hstack([xp**i for i in range(10)])
polybasis = np.asmatrix(polybasis)

yp = polybasis*theta

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', label = 'Data')
plt.plot(xp[:,0], yp[:,0], linewidth = 2, label = '9th degree')
plt.title('Nonlinear Regression with Polynomial Functions', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.legend(fontsize = 12)
plt.grid(alpha = 0.3)
plt.axis('equal')
plt.show()
[[ 3.48274701e-01]
 [-2.58951123e+00]
 [-4.55286474e-01]
 [ 1.85022226e+00]
 [ 1.06250369e-01]
 [-4.43328786e-01]
 [-9.25753472e-03]
 [ 3.63088178e-02]
 [ 2.35143849e-04]
 [-9.24099978e-04]]
No description has been provided for this image
In [ ]:
d = [1, 3, 5, 9]
RSS = []

plt.figure(figsize = (8, 6))
plt.suptitle('Nonlinear Regression', fontsize = 12)

for k in range(4):
    A = np.hstack([x**i for i in range(d[k]+1)])
    polybasis = np.hstack([xp**i for i in range(d[k]+1)])

    A = np.asmatrix(A)
    polybasis = np.asmatrix(polybasis)

    theta = (A.T*A).I*A.T*y
    yp = polybasis*theta

    RSS.append(np.linalg.norm(y - A*theta, 2)**2)

    plt.subplot(2, 2, k+1)
    plt.plot(x, y, 'o')
    plt.plot(xp, yp)
    plt.axis([-5, 5, -12, 6])
    plt.title('degree = {}'.format(d[k]))
    plt.grid(alpha=0.3)

plt.show()
No description has been provided for this image
In [ ]:
plt.figure(figsize = (6, 4))
plt.stem(d, RSS, label = 'RSS')
plt.title('Residual Sum of Squares', fontsize = 12)
plt.xlabel('degree', fontsize = 12)
plt.ylabel('RSS', fontsize = 12)
plt.legend(fontsize = 12)
plt.grid(alpha = 0.3)
plt.show()
No description has been provided for this image

4.1. Linear Basis Function Models

  • Construct explicit feature vectors

  • Consider linear combinations of fixed nonlinear functions of the input variables, of the form


$$ \begin{bmatrix} 1 & x_{1} & x_1^2\\1 & x_{2} & x_2^2\\\vdots & \vdots\\1 & x_{m} & x_m^2 \end{bmatrix} \begin{bmatrix}\theta_0\\\theta_1 \\ \theta_2 \end{bmatrix} \quad \Rightarrow \quad \begin{bmatrix} \mid & \mid & \mid \\ b_0(x) & b_1(x) & b_2(x)\\ \mid & \mid & \mid \end{bmatrix} \begin{bmatrix}\theta_0\\\theta_1 \\ \theta_2 \end{bmatrix} $$


$$ \hat{y}=\sum_{i=0}^d{\theta_i b_i(x)} = \Phi \theta$$


1) Polynomial functions


$$b_i(x) = x^i, \quad i = 0,\cdots,d$$

In [ ]:
xp = np.arange(-1, 1, 0.01).reshape(-1, 1)
polybasis = np.hstack([xp**i for i in range(6)])

plt.figure(figsize = (6, 4))

for i in range(6):
    plt.plot(xp, polybasis[:,i], label = '$x^{}$'.format(i))

plt.title('Polynomial Basis', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.axis([-1, 1, -1.1, 1.1])
plt.grid(alpha = 0.3)
plt.legend(fontsize = 12)
plt.show()
No description has been provided for this image

2) RBF functions

With bandwidth $\sigma$ and $k$ RBF centers $\mu_i \in \mathbb{R}^n$


$$ b_i(x) = \exp \left( - \frac{\lVert x-\mu_i \rVert^2}{2\sigma^2}\right) $$


In [ ]:
d = 9

u = np.linspace(-1, 1, d)
sigma = 0.2

rbfbasis = np.hstack([np.exp(-(xp-u[i])**2/(2*sigma**2)) for i in range(d)])

plt.figure(figsize = (6, 4))

for i in range(d):
    plt.plot(xp, rbfbasis[:,i], label='$\mu = {}$'.format(u[i]))

plt.title('RBF basis', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.axis([-1, 1, -0.1, 1.1])
plt.legend(loc = 'lower right', fontsize = 10)
plt.grid(alpha = 0.3)
plt.show()
No description has been provided for this image
  • With many features, our prediction function becomes very expressive

  • Can lead to overfitting (low error on input data points, but high error nearby)

In [ ]:
xp = np.arange(-4.5, 4.5, 0.01).reshape(-1, 1)

d = 10
u = np.linspace(-4.5, 4.5, d)
sigma = 0.2

A = np.hstack([np.exp(-(x-u[i])**2/(2*sigma**2)) for i in range(d)])
rbfbasis = np.hstack([np.exp(-(xp-u[i])**2/(2*sigma**2)) for i in range(d)])

A = np.asmatrix(A)
rbfbasis = np.asmatrix(rbfbasis)

theta = (A.T*A).I*A.T*y
yp = rbfbasis*theta

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', label = 'Data')
plt.plot(xp, yp, label = 'Overfitted')
plt.title('(Overfitted) Regression with RBF basis', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.grid(alpha = 0.3)
plt.legend(fontsize = 12)
plt.axis('equal')
plt.show()
No description has been provided for this image
In [ ]:
d = [2, 4, 6, 10]
sigma = 1

plt.figure(figsize = (10, 8))

for k in range(4):
    u = np.linspace(-4.5, 4.5, d[k])

    A = np.hstack([np.exp(-(x-u[i])**2/(2*sigma**2)) for i in range(d[k])])
    rbfbasis = np.hstack([np.exp(-(xp-u[i])**2/(2*sigma**2)) for i in range(d[k])])

    A = np.asmatrix(A)
    rbfbasis = np.asmatrix(rbfbasis)

    theta = (A.T*A).I*A.T*y
    yp = rbfbasis*theta

    plt.subplot(2, 2, k+1)
    plt.plot(x, y, 'o')
    plt.plot(xp, yp)
    plt.axis([-5, 5, -12, 6])
    plt.title('num RBFs = {}'.format(d[k]), fontsize = 10)
    plt.grid(alpha = 0.3)

plt.suptitle('Nonlinear Regression with RBF Functions', fontsize = 12)
plt.show()
No description has been provided for this image

4.2. Regularization (Shrinkage Methods)

Often, overfitting associated with very large estimated parameters $\theta$

We want to balance

  • how well function fits data

  • magnitude of coefficients

    $$ \begin{align*} \text{Total cost } = \;&\underbrace{\text{measure of fit}}_{RSS(\theta)} + \;\lambda \cdot \underbrace{\text{measure of magnitude of coefficients}}_{\lambda \cdot \lVert \theta \rVert_2^2} \\ \\ \implies &\min\; \lVert \Phi \theta - y \rVert_2^2 + \lambda \lVert \theta \rVert_2^2 \end{align*} $$
    where $ RSS(\theta) = \lVert \Phi\theta - y \rVert^2_2 $, ( = Rresidual Sum of Squares) and $\lambda$ is a tuning parameter to be determined separately


  • the second term, $\lambda \cdot \lVert \theta \rVert_2^2$, called a shrinkage penalty, is small when $\theta_1, \cdots,\theta_d$ are close to zeros, and so it has the effect of shrinking the estimates of $\theta_j$ towards zero

  • The tuning parameter $\lambda$ serves to control the relative impact of these two terms on the regression coefficient estimates

  • known as a ridge regression

  • Example: start from rich representation.

$$\min\; \lVert \Phi \theta - y \rVert_2^2$$


In [ ]:
# CVXPY code

import cvxpy as cvx

d = 10
u = np.linspace(-4.5, 4.5, d)

sigma = 1

A = np.hstack([np.exp(-(x-u[i])**2/(2*sigma**2)) for i in range(d)])
rbfbasis = np.hstack([np.exp(-(xp-u[i])**2/(2*sigma**2)) for i in range(d)])

A = np.asmatrix(A)
rbfbasis = np.asmatrix(rbfbasis)

theta = cvx.Variable([d, 1])
obj = cvx.Minimize(cvx.sum_squares(A@theta - y))
prob = cvx.Problem(obj).solve()

yp = rbfbasis*theta.value

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', label = 'Data')
plt.plot(xp, yp, label = 'Overfitted')
plt.title('(Overfitted) Regression', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.axis([-5, 5, -12, 6])
plt.legend(fontsize = 12)
plt.grid(alpha = 0.3)
plt.axis('equal')
plt.show()
No description has been provided for this image
  • Start from rich representation. Then, regularize coefficients $\theta$

$$\min\; \lVert \Phi \theta - y \rVert_2^2 + \lambda \lVert \theta \rVert_2^2$$


In [ ]:
# ridge regression

lamb = 0.1
theta = cvx.Variable([d, 1])
obj = cvx.Minimize(cvx.sum_squares(A@theta - y) + lamb*cvx.sum_squares(theta))
prob = cvx.Problem(obj).solve()

yp = rbfbasis*theta.value

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', label = 'Data')
plt.plot(xp, yp, label = 'Ridge')
plt.title('Ridge Regularization (L2)', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.axis([-5, 5, -12, 6])
plt.legend(fontsize = 12)
plt.grid(alpha = 0.3)
plt.axis('equal')
plt.show()
No description has been provided for this image
In [ ]:
# Regulization (= ridge nonlinear regression) encourages small weights, but not exactly 0

plt.figure(figsize = (6, 4))
plt.title(r'Ridge: magnitude of $\theta$', fontsize = 12)
plt.xlabel(r'$\theta$', fontsize = 12)
plt.ylabel('magnitude', fontsize = 12)
plt.stem(np.linspace(1, 10, 10).reshape(-1, 1), theta.value)
plt.xlim([0.5, 10.5])
plt.ylim([-5, 5])
plt.grid(alpha = 0.3)
plt.show()
No description has been provided for this image
In [ ]:
lamb = np.arange(0,3,0.01)

theta_record = []
for k in lamb:
    theta = cvx.Variable([d, 1])
    obj = cvx.Minimize(cvx.sum_squares(A@theta - y) + k*cvx.sum_squares(theta))
    prob = cvx.Problem(obj).solve()
    theta_record.append(np.ravel(theta.value))

plt.figure(figsize = (6, 4))
plt.plot(lamb, theta_record, linewidth = 1)
plt.title('Ridge coefficients as a function of regularization', fontsize = 12)
plt.xlabel('$\lambda$', fontsize = 12)
plt.ylabel(r'weight $\theta$', fontsize = 12)
plt.show()
No description has been provided for this image

4.3. Sparsity for Feature Selection using LASSO

  • Least Squares with a penalty on the $l_1$ norm of the parameters

  • Start with full model (all possible features)

  • 'Shrink' some coefficients exactly to 0

    • i.e., knock out certain features
    • the $l_1$ penalty has the effect of forcing some of the coefficient estimates to be exactly equal to zero
  • Non-zero coefficients indicate 'selected' features


Try this cost instead of ridge...


$$ \begin{align*} \text{Total cost } = \;&\underbrace{\text{measure of fit}}_{RSS(\theta)} + \;\lambda \cdot \underbrace{\text{measure of magnitude of coefficients}}_{\lambda \cdot \lVert \theta \rVert_1} \\ \\ \implies &\min\; \lVert \Phi \theta - y \rVert_2^2 + \lambda \lVert \theta \rVert_1 \end{align*}$$

  • $\lambda$ is a tuning parameter = balance of fit and sparsity

  • Another equivalent forms of optimizations

$$ \begin{array}{rl} \begin{align*} \min_{\theta} \quad & \lVert \Phi \theta - y \rVert_2^2 \\ \text{subject to} \quad & \lVert \theta \rVert_1 \leq s_1 \end{align*} \end{array} \quad\quad\quad\quad \begin{array}{rl} \min_{\theta} \; & \lVert \Phi \theta - y \rVert_2^2 \\ \text{subject to} \; & \lVert \theta \rVert_2 \leq s_2 \end{array} $$





$$\min\; \lVert \Phi \theta - y \rVert_2^2 + \lambda \lVert \theta \rVert_1$$


In [ ]:
# LASSO regression

lamb = 2
theta = cvx.Variable([d, 1])
obj = cvx.Minimize(cvx.sum_squares(A@theta - y) + lamb*cvx.norm(theta, 1))
prob = cvx.Problem(obj).solve()

yp = rbfbasis*theta.value

plt.figure(figsize = (6, 4))
plt.title('LASSO Regularization (L1)', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.plot(x, y, 'o', label = 'Data')
plt.plot(xp, yp, label = 'LASSO')
plt.axis([-5, 5, -12, 6])
plt.legend(fontsize = 12)
plt.grid(alpha = 0.3)
plt.axis('equal')
plt.show()
No description has been provided for this image
In [ ]:
# Regulization (= Lasso nonlinear regression) encourages zero weights

plt.figure(figsize = (6, 4))
plt.title(r'LASSO: magnitude of $\theta$', fontsize = 12)
plt.xlabel(r'$\theta$', fontsize = 12)
plt.ylabel('magnitude', fontsize = 12)
plt.stem(np.arange(1,11), theta.value)
plt.xlim([0.5, 10.5])
plt.ylim([-5,1])
plt.grid(alpha = 0.3)
plt.show()
No description has been provided for this image
In [ ]:
lamb = np.arange(0,3,0.01)

theta_record = []
for k in lamb:
    theta = cvx.Variable([d, 1])
    obj = cvx.Minimize(cvx.sum_squares(A@theta - y) + k*cvx.norm(theta, 1))
    prob = cvx.Problem(obj).solve()
    theta_record.append(np.ravel(theta.value))

plt.figure(figsize = (6, 4))
plt.plot(lamb, theta_record, linewidth = 1)
plt.title('LASSO coefficients as a function of regularization', fontsize = 12)
plt.xlabel('$\lambda$', fontsize = 12)
plt.ylabel(r'weight $\theta$', fontsize = 12)
plt.show()
No description has been provided for this image
In [ ]:
# reduced order model
# we will use only theta 2, 3, 8, 10

d = 4
u = np.array([-3.5, -2.5, 2.5, 4.5])

sigma = 1

A = np.hstack([np.exp(-(x-u[i])**2/(2*sigma**2)) for i in range(d)])
rbfbasis = np.hstack([np.exp(-(xp-u[i])**2/(2*sigma**2)) for i in range(d)])

A = np.asmatrix(A)
rbfbasis = np.asmatrix(rbfbasis)

theta = cvx.Variable([d, 1])
obj = cvx.Minimize(cvx.norm(A@theta - y, 2))
prob = cvx.Problem(obj).solve()

yp = rbfbasis*theta.value

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', label = 'Data')
plt.plot(xp, yp, label = 'Overfitted')
plt.title('(Overfitted) Regression', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.axis([-5, 5, -12, 6])
plt.legend(fontsize = 12)
plt.grid(alpha = 0.3)
plt.axis('equal')
plt.show()
No description has been provided for this image

4.4. L2 and L1 Regularizers


$$\min \;(x-1)^2 + \frac{1}{6}(x-1)$$

4.4.1. L2 Penality


$$\min \; (x-1)^2 + \frac{1}{6}(x-1) + kx^2, \quad k=0,1,2,\cdots$$


In [ ]:
x = np.arange(-4,4,0.1)

k = 4
y = (x-1)**2 + 1/6*(x-1)**3 + k*x**2

x_star = x[np.argmin(y)]
print(x_star)
0.20000000000000373
In [ ]:
plt.plot(x, y, 'g', linewidth = 2.5)
plt.axvline(x = x_star, color = 'k', linewidth = 1, linestyle = '--')
plt.ylim([0,10])
plt.show()
No description has been provided for this image
In [ ]:
for k in [0,1,2,4]:
    y = (x-1)**2 + 1/6*(x-1)**3 + k*x**2
    x_star = x[np.argmin(y)]

    plt.plot(x,y, 'g', linewidth = 2.5)
    plt.axvline(x = x_star, color = 'k', linewidth = 1, linestyle = '--')
    plt.ylim([0,10])
    plt.title('Ridge: k = {}'.format(k))
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

4.4.2. L1 Penalty


$$\min \; (x-1)^2 + \frac{1}{6}(x-1) + k\lvert x \rvert, \quad k=0,1,2,\cdots$$


In [ ]:
x = np.arange(-4,4,0.1)
k = 2
y = (x-1)**2 + 1/6*(x-1)**3 + k*abs(x)

x_star = x[np.argmin(y)]
print(x_star)
3.552713678800501e-15
In [ ]:
plt.plot(x, y, 'g', linewidth = 2.5)
plt.axvline(x = x_star, color = 'k', linewidth = 1, linestyle = '--')
plt.ylim([0,10])
plt.show()
No description has been provided for this image
In [ ]:
for k in [0,1,2]:
    y = (x-1)**2 + 1/6*(x-1)**3 + k*abs(x)
    x_star = x[np.argmin(y)]

    plt.plot(x,y, 'g', linewidth = 2.5)
    plt.axvline(x = x_star, color = 'k', linewidth = 1, linestyle = '--')
    plt.ylim([0,10])
    plt.title('LASSO: k = {}'.format(k))
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

4.5. Scikit-Learn

  • RBF functions $g_i(x)$ as basis

$$ \begin{align*} \Phi &= \begin{bmatrix} \mid & \mid & & \mid \\ b_0(x) & b_1(x) & \cdots & b_d(x) \\ \mid & \mid & & \mid \end{bmatrix} = \begin{bmatrix} 1 & \mid & & \mid \\ \vdots & b_1(x) & \cdots & b_d(x) \\ 1 & \mid & & \mid \end{bmatrix}\\\\ \implies \Phi \theta &= \sum_{i=0}^d \theta_i b_i(x) : \; \text{a linearly-combined function of RBF basis with coefficients of } \theta \end{align*} $$


In [ ]:
# 10 data points

n = 10
x = np.linspace(-4.5, 4.5, 10).reshape(-1, 1)
y = np.array([0.9819, 0.7973, 1.9737, 0.1838, 1.3180, -0.8361, -0.6591, -2.4701, -2.8122, -6.2512]).reshape(-1, 1)

d = 10
u = np.linspace(-4.5, 4.5, d)

sigma = 1
A = np.hstack([np.exp(-(x-u[i])**2/(2*sigma**2)) for i in range(d)])
In [ ]:
from sklearn import linear_model

reg = linear_model.Ridge(alpha = 0.1)
reg.fit(A,y)
Out[ ]:
Ridge(alpha=0.1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [ ]:
# to plot

xp = np.arange(-4.5, 4.5, 0.01).reshape(-1, 1)
Ap = np.hstack([np.exp(-(xp-u[i])**2/(2*sigma**2)) for i in range(d)])

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', label = "data")
plt.plot(xp, reg.predict(Ap), linewidth = 2, label = "Ridge")
plt.title('Ridge Regression (L2)', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.legend(fontsize = 12)
plt.grid(alpha = 0.3)
plt.axis('equal')
plt.show()
No description has been provided for this image
In [ ]:
reg = linear_model.Lasso(alpha = 0.005)
reg.fit(A,y)
Out[ ]:
Lasso(alpha=0.005)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [ ]:
# to plot

xp = np.arange(-4.5, 4.5, 0.01).reshape(-1, 1)
Ap = np.hstack([np.exp(-(xp-u[i])**2/(2*sigma**2)) for i in range(d)])

plt.figure(figsize = (6, 4))
plt.plot(x, y, 'o', label = "data")
plt.plot(xp, reg.predict(Ap), linewidth = 2, label = "LASSO")
plt.title('LASSO Regression (L1)', fontsize = 12)
plt.xlabel('X', fontsize = 12)
plt.ylabel('Y', fontsize = 12)
plt.legend(fontsize = 12)
plt.grid(alpha = 0.3)
plt.axis('equal')
plt.show()
No description has been provided for this image

5. De-noising Signal

We start with a signal represented by a vector $x \in \mathbb{R}^n$. The coefficients $x_i$ correspond to the value of some function of time, evaluated (or sampled, in the language of signal processing) at evenly spaced points. It is usually assumed that the signal does not vary too rapidly, which means that usually, we have $x_i \approx x_{i+1}$.

Suppose we have a signal $x$, which does not vary too rapidly and that $x$ is corrupted by some small, rapidly varying noise $\varepsilon$, i.e. $x_{cor} = x + \varepsilon$.

Then if we want to reconstruct $x$ from $x_{cor}$ we should solve (with $\hat{x}$ as the parameter)


$$ \text{minimize} \quad \lVert \hat{x} - x_{cor}\rVert_2^2 + \mu\sum_{i=1}^{n-1}(x_{i+1}-x_i)^2 $$


where the parameter $\mu$ controls the ''smoothnes'' of $\hat{x}$.

Source:

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
import cvxpy as cvx

%matplotlib inline
In [ ]:
n = 200
t = np.arange(n).reshape(-1,1)
x = 0.5 * np.sin((2*np.pi/n)*t) * (np.sin(0.01*t))
x_cor = x + 0.05*np.random.randn(n,1)

plt.figure(figsize = (6, 4))
plt.subplot(2,1,1)
plt.plot(t,x,'-', linewidth = 2)
plt.axis([0, n, -0.6, 0.6])
plt.xticks([])
plt.title('original signal' , fontsize = 15)
plt.ylabel('$x_{original}$', fontsize = 15)

plt.subplot(2,1,2)
plt.plot(t, x_cor,'-', linewidth = 1)
plt.axis([0, n, -0.6, 0.6])
plt.title('corrupted signal', fontsize = 15)
plt.xlabel('n', fontsize = 15)
plt.ylabel('$x_{corrupted}$', fontsize = 15)
plt.show()
No description has been provided for this image

5.1. Transform de-noising in time into an optimization problem


$$X = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix} $$


$$\large \min\limits_{X}\;\left\{\underbrace{\lVert(X - X_{cor})\rVert^2_{2}}_{\text{how much } X \text{ deviates from }X_{cor}} + \mu \underbrace{\sum_{k = 1}^{n-1}(x_{k+1}-x_{k})^2}_{\text{penalize rapid changes of } X}\right\}$$



$\quad$1) $\min\limits_{X}\;\lVert(X - X_{cor})\rVert^2_{2}$: $\quad$ How much $X$ deviates from $X_{cor}$

$\quad$2) $\mu\sum\limits_{k = 1}^{n-1}(x_{k+1}-x_{k})^2$: $\quad$ penalize rapid changes of $X$

$\quad$3) $\mu$ : to adjust the relative weight of 1) & 2)


  • In a vector form

$\quad$1) $X - X_{cor} = I_n X - X_{cor}$

$\quad$2) $\sum\;(x_{k+1}-x_{k})^2 \implies$


$$(x_{2} - x_{1}) - 0 = \begin{bmatrix} -1,&1,&0,&\cdots&0 \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix}-0 $$

$$(x_{3} - x_{2}) - 0 = \begin{bmatrix} 0,&-1,&1,&\cdots&0 \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix}-0 $$


$$ \vdots $$


$$\implies \left \Arrowvert \; \begin{bmatrix} -1&1&0&\cdots&0&0 \\ 0&-1&1&\cdots&0&0 \\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&0&\cdots&-1&1 \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix} - \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \; \right\Arrowvert^2_{2} $$


$$\hspace{3.2cm}D \quad\quad\quad\quad\quad\quad\qquad X \quad\; - \; \; 0 $$




$$\begin{align*} \left\Vert I_n X-X_{cor}\big\Vert^2_{2} + \mu\big\Vert DX-0\right\Vert^2_{2} & = \big\Vert Ax-b\big\Vert^2_{2} \\ \\ & = \Bigg\Vert \begin{bmatrix} I_n\\ \sqrt{\mu}D \end{bmatrix} X- \begin{bmatrix} X_{cor}\\ 0 \end{bmatrix} \Bigg\Vert^2_{2} \end{align*}$$


$$ \text{where} \; A = \begin{bmatrix} I_n\\ \sqrt{\mu}D \end{bmatrix},\quad b = \begin{bmatrix} X_{cor}\\ 0 \end{bmatrix} $$


  • Then, plug $A$, $b$ to $(A^TA)^{-1}A^Tb$
In [ ]:
mu = 100

D = np.zeros([n-1, n])
D[:,0:n-1] -= np.eye(n-1)

D[:,1:n] += np.eye(n-1)
A = np.vstack([np.eye(n), np.sqrt(mu)*D])

b = np.vstack([x_cor, np.zeros([n-1,1])])

A = np.asmatrix(A)
b = np.asmatrix(b)

x_reconst = (A.T*A).I*A.T*b

plt.figure(figsize = (6, 2))
plt.plot(t, x_cor, '-', linewidth = 1, alpha = 0.5, label = 'corrupted');
plt.plot(t, x_reconst, 'r', linewidth = 2, label = 'reconstructed')
plt.title('$\mu = {}$'.format(mu), fontsize = 15)
plt.legend(fontsize = 12)
plt.show()
No description has been provided for this image

5.1.1. with different $\mu$'s (see how $\mu$ affects smoothing results)

In [ ]:
plt.figure(figsize = (6, 6))

mu = [0, 10, 100];

for i in range(len(mu)):
    A = np.vstack([np.eye(n), np.sqrt(mu[i])*D])
    b = np.vstack([x_cor, np.zeros([n-1,1])])

    A = np.asmatrix(A)
    b = np.asmatrix(b)

    x_reconst = (A.T*A).I*A.T*b

    plt.subplot(3,1,i+1)
    plt.plot(t, x_cor, '-', linewidth = 1, alpha = 0.5)
    plt.plot(t, x_reconst, 'r', linewidth = 2)
    plt.ylabel('$\mu = {}$'.format(mu[i]), fontsize = 15)

plt.show()
No description has been provided for this image

5.1.2. CVXPY Implementation

  • cvxpy toolbox to numerically solve

$$ \min\; \left\{ \lVert x-x_{cor}\rVert_2^2 + \mu \lVert Dx \rVert_2^2 \right\}$$


In [ ]:
mu = 100

x_reconst = cvx.Variable([n,1])
#obj = cvx.Minimize(cvx.sum_squares(x_reconst - x_cor) + mu*cvx.sum_squares(x_reconst[1:n]-x_reconst[0:n-1]))
obj = cvx.Minimize(cvx.sum_squares(x_reconst - x_cor) + mu*cvx.sum_squares(D@x_reconst))

prob = cvx.Problem(obj).solve()

plt.figure(figsize = (6, 2))
plt.plot(t, x_cor, '-', linewidth = 1, alpha = 0.5, label = 'corrupted');
plt.plot(t, x_reconst.value, 'r', linewidth = 2, label = 'reconstructed')
plt.title('$\mu = {}$'.format(mu), fontsize = 15)
plt.legend(fontsize = 12)
plt.show()
No description has been provided for this image
In [ ]:
plt.figure(figsize = (6, 6))

mu = [0, 10, 100]

for i in range(len(mu)):
    x_reconst = cvx.Variable([n,1])
    obj = cvx.Minimize(cvx.sum_squares(x_reconst - x_cor) + mu[i]*cvx.sum_squares(D @ x_reconst))
    prob = cvx.Problem(obj).solve()

    plt.subplot(3,1,i+1)
    plt.plot(t,x_cor,'-', linewidth = 1, alpha = 0.5)
    plt.plot(t,x_reconst.value, 'r', linewidth = 2)
    plt.ylabel('$\mu = {}$'.format(int(mu[i])), fontsize = 15)

plt.show()
No description has been provided for this image

5.1.3. L2 Norm

  • CVXPY strongly encourages to eliminate quadratic forms, that is, functions like sum_squares, sum(square(.)) or quad_form

  • Whenever it is possible to construct equivalent models using norm instead


$$ \min \; \left\{ \lVert x-x_{cor}\rVert_2 + \gamma \lVert Dx \rVert_2 \right \}$$


In [ ]:
plt.figure(figsize = (6, 6))

gammas = [0, 2, 7]

for i in range(len(gammas)):
    x_reconst = cvx.Variable([n,1])
    obj = cvx.Minimize(cvx.norm(x_reconst-x_cor, 2) + gammas[i]*(cvx.norm(D @ x_reconst, 2)))
    prob = cvx.Problem(obj).solve()

    plt.subplot(3,1,i+1)
    plt.plot(t,x_cor,'-', linewidth = 1, alpha = 0.5)
    plt.plot(t,x_reconst.value, 'r', linewidth = 2)
    plt.ylabel('$ \gamma = {}$'.format(gammas[i]), fontsize = 15)

plt.show()
No description has been provided for this image

5.1.4. L2 Norm with a Constraint


$$\min \; \lVert Dx \rVert_2$$

$$ s.t. \quad \lVert x-x_{cor} \rVert_2 < \beta $$


In [ ]:
beta = 0.8

x_reconst = cvx.Variable([n,1])
obj = cvx.Minimize(cvx.norm(D @ x_reconst, 2))
const = [cvx.norm(x_reconst-x_cor, 2) <= beta]
prob = cvx.Problem(obj, const).solve()

plt.figure(figsize = (6, 2))
plt.plot(t,x_cor,'-', linewidth = 1, alpha = 0.5)
plt.plot(t,x_reconst.value, 'r', linewidth = 2)
plt.ylabel(r'$\beta = {}$'.format(beta), fontsize = 15)
plt.show()
No description has been provided for this image
In [ ]:
plt.figure(figsize = (6, 6))

beta = [0.1, 0.6, 0.8]

for i in range(len(beta)):

    x_reconst = cvx.Variable([n,1])
    obj = cvx.Minimize(cvx.norm(x_reconst[1:n] - x_reconst[0:n-1], 2))
    const = [cvx.norm(x_reconst-x_cor, 2) <= beta[i]]
    prob = cvx.Problem(obj, const).solve()

    plt.subplot(len(beta),1,i+1)
    plt.plot(t,x_cor,'-', linewidth = 1, alpha = 0.5)
    plt.plot(t,x_reconst.value, 'r', linewidth = 2)
    plt.ylabel(r'$\beta = {}$'.format(beta[i]), fontsize = 15)

plt.show()
No description has been provided for this image

5.2. Signal with Sharp Transition + Noise

Suppose we have a signal $x$, which is mostly smooth, but has several rapid variations (or jumps). If we apply quadratic smoothing on this signal, then in order to remove the noise we will not be able to preserve the signal's sharp transitions.

  • First, apply the same method that we used for smoothing signals before
In [ ]:
n = 200
t = np.arange(n).reshape(-1,1)

exact = np.vstack([np.ones([50,1]), -np.ones([50,1]), np.ones([50,1]), -np.ones([50,1])])
x = exact + 0.5*np.sin((2*np.pi/n)*t)
x_cor = x + 0.1*np.random.randn(n,1)

plt.figure(figsize = (6, 4))
plt.subplot(2,1,1)
plt.plot(t, x)
plt.ylim([-2.0,2.0])
plt.ylabel('signal', fontsize = 15)
plt.subplot(2,1,2)
plt.plot(t, x_cor, linewidth = 1)
plt.ylabel('corrupted signal', fontsize = 15)
plt.xlabel('x', fontsize = 15)
plt.show()
No description has been provided for this image

5.2.1. L2 Norm (Quadratic Smoothing)

In [ ]:
plt.figure(figsize = (6, 6))

beta = [0.5, 2, 4]

for i in range(len(beta)):
    x_reconst = cvx.Variable([n,1])
    obj = cvx.Minimize(cvx.norm(x_reconst[1:n] - x_reconst[0:n-1], 2))
    const = [cvx.norm(x_reconst - x_cor, 2) <= beta[i]]
    prob = cvx.Problem(obj, const).solve()

    plt.subplot(len(beta), 1, i+1)
    plt.plot(t, x_cor, linewidth = 1, alpha = 0.5)
    plt.plot(t, x_reconst.value, 'r', linewidth = 2)
    plt.ylabel(r'$\beta = {}$'.format(beta[i]), fontsize = 15)

plt.show()
No description has been provided for this image
  • Quadratic smoothing smooths out noise and sharp transitions in signal, but this is not what we want

  • Any ideas ?

5.2.2. L1 Norm

We can instead apply total variation reconstruction on the signal by solving


$$\min \; \lVert x - x_{cor} \rVert_2 + \lambda \sum_{i=1}^{n-1} \;\lvert x_{i+1}-x_i \rvert $$

where the parameter $\lambda$ controls the ''smoothness'' of $x$.


$$\min \; \lVert Dx \rVert_1$$

$$ s.t. \quad \lVert x-x_{cor} \rVert_2 < \beta $$


In [ ]:
plt.figure(figsize = (6, 6))

beta = [0.5, 2, 4]

for i in range(len(beta)):
    x_reconst = cvx.Variable([n,1])
    obj = cvx.Minimize(cvx.norm(x_reconst[1:n] - x_reconst[0:n-1], 1))
    const = [cvx.norm(x_reconst-x_cor, 2) <= beta[i]]
    prob = cvx.Problem(obj, const).solve()

    plt.subplot(len(beta), 1, i+1)
    plt.plot(t, x_cor, linewidth = 1, alpha = 0.5)
    plt.plot(t, x_reconst.value, 'r', linewidth = 2)
    plt.ylabel(r'$\beta = {}$'.format(beta[i]), fontsize = 15)

plt.show()
No description has been provided for this image
  • Total Variation (TV) smoothing preserves sharp transitions in signal, and this is not bad

  • Note how TV reconstruction does a better job of preserving the sharp transitions in the signal while removing the noise.

6. Total Variation Image Reconstruction

In [ ]:
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
In [ ]:
import cv2

imbw = cv2.imread('/content/drive/MyDrive/ML_Colab/ML_data/dog.jpg', 0)

row = 150
col = 150
resized_imbw = cv2.resize(imbw, (row, col))

plt.figure(figsize = (6, 6))
plt.imshow(resized_imbw, 'gray')
plt.axis('off')
plt.show()
No description has been provided for this image
  • Question: Apply $L_1$ norm to the image, and guess what kind of an image will be produced ?

$$\min \; \lVert Dx \rVert_1$$

$$ s.t. \quad \lVert x-x_{cor} \rVert_2 < \beta $$


In [ ]:
n = row*col
imbws = resized_imbw.reshape(-1, 1)

beta = 1500

x = cvx.Variable([n,1])
obj = cvx.Minimize(cvx.norm(x[1:n] - x[0:n-1],1))
const = [cvx.norm(x - imbws,2) <= beta]
prob = cvx.Problem(obj, const).solve()

imbwr = x.value.reshape(row, col)

plt.figure(figsize = (6, 6))
plt.imshow(imbwr,'gray')
plt.axis('off')
plt.show()
No description has been provided for this image
In [ ]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')