Unsupervised Learning: Dimension Reduction
Table of Contents
Motivation: Can we describe high-dimensional data in a "simpler" way?
$\quad \rightarrow$ Dimension reduction without losing too much information
$\quad \rightarrow$ Find a low-dimensional, yet useful representation of the data
Each example $x$ has 2 features $\{x_1,x_2\}$
Consider ignoring the feature $x_2$ for each example
Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{x_1\}$
Are we losing much information by throwing away $x_2$ ?
Each example $x$ has 2 features $\{x_1,x_2\}$
Consider ignoring the feature $x_2$ for each example
Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{x_1\}$
Are we losing much information by throwing away $x_2$ ?
Yes, the data has substantial variance along both features (i.e. both axes)
Now consider a change of axes
Each example $x$ has 2 features $\{u_1,u_2\}$
Consider ignoring the feature $u_2$ for each example
Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{u_1\}$
Are we losing much information by throwing away $u_2$ ?
No. Most of the data spread is along $u_1$ (very little variance along $u_2$)
Find unit vector $u$ such that maximizes variance of projections
Note: $m \approx m-1$ for big data
$$\begin{align*} &\text{minimize} \; \underbrace{\frac {1}{m} \sum\limits_{i=1}^{m} \lVert x^{(i)} \rVert ^2}_{\text{constant given $x_i$}} -
\underbrace{u^T\left(\frac{1}{m}\sum\limits_{i=1}^{m}x^{(i)}{x^{(i)}}^T\right)u}_{\text{maximize}} \\ \\
\implies &\text{maximize} \;
u^T\left(\frac{1}{m}\sum\limits_{i=1}^{m}x^{(i)}{x^{(i)}}^T\right)u = \max \; u^T S u\end{align*}$$
Choose top $k$ (orthonormal) eigenvectors, $U = [u_1, u_2, \cdots, u_k]$
Project $x_i$ onto span $\{ u_1, u_2, \cdots, u_k\}$
$$z^{(i)} = \begin{bmatrix}
u_1^Tx^{(i)}\\
u_2^Tx^{(i)}\\
\vdots \\
u_k^Tx^{(i)}\\
\end{bmatrix} \;\text{ or }\; z = U^{T}x
$$
$\qquad \qquad \qquad x^{(i)} \rightarrow$ projection onto unit vector $u \implies u^Tx^{(i)} = $ distance from the origin along $u$
$\lambda_1, \lambda_2$ indicates variance along the eigenvectors, respectively.
The larger eigenvalue is, the more dominant feature (eigenvector) is
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# data generation
m = 5000
mu = np.array([0, 0])
sigma = np.array([[3, 1.5],
[1.5, 1]])
X = np.random.multivariate_normal(mu, sigma, m)
X = np.asmatrix(X)
fig = plt.figure(figsize = (10, 8))
plt.plot(X[:,0], X[:,1], 'k.', alpha = 0.3)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.show()
S = 1/(m-1)*X.T*X
D, U = np.linalg.eig(S)
idx = np.argsort(-D)
D = D[idx]
U = U[:,idx]
print(D, '\n')
print(U)
h = U[1,0]/U[0,0]
xp = np.arange(-6, 6, 0.1)
yp = h*xp
fig = plt.figure(figsize = (10, 8))
plt.plot(X[:,0], X[:,1], 'k.', alpha = 0.3)
plt.plot(xp, yp, 'r', linewidth = 3)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.show()
Z = X*U[:,0]
plt.figure(figsize = (10, 8))
plt.hist(Z, 51)
plt.show()
Z = X*U[:,1]
plt.figure(figsize = (10, 8))
plt.hist(Z, 51)
plt.show()
from sklearn.decomposition import PCA
pca = PCA(n_components = 1)
pca.fit(X)
pca.components_
pca.explained_variance_
pca.explained_variance_ratio_
u = pca.transform(X)
plt.figure(figsize = (10, 8))
plt.hist(u, 51)
plt.show()
h = pca.components_[0][1] / pca.components_[0][0]
xp = np.arange(-6, 6, 0.1)
yp = h*xp
fig = plt.figure(figsize = (10, 8))
plt.plot(X[:,0], X[:,1], 'k.', alpha = 0.3)
plt.plot(xp, yp, 'r', linewidth = 3)
plt.axis('equal')
plt.grid(alpha = 0.3)
plt.show()
%%html
<center><iframe src="https://www.youtube.com/embed/HdZQmDlEPu4"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/lt9Z4d_Z0jg"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/E-1ItDu-2EQ"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
Download files
from six.moves import cPickle
X = cPickle.load(open('./data_files/pca_spring.pkl','rb'))
X = np.asmatrix(X.T)
print(X.shape)
m = X.shape[0]
plt.figure(figsize = (12, 6))
plt.subplot(1,3,1)
plt.plot(X[:, 0], -X[:, 1], 'r')
plt.axis('equal')
plt.title('Camera 1')
plt.subplot(1,3,2)
plt.plot(X[:, 2], -X[:, 3], 'b')
plt.axis('equal')
plt.title('Camera 2')
plt.subplot(1,3,3)
plt.plot(X[:, 4], -X[:, 5], 'k')
plt.axis('equal')
plt.title('Camera 3')
plt.show()
X = X - np.mean(X, axis = 0)
S = 1/(m-1)*X.T*X
D, U = np.linalg.eig(S)
idx = np.argsort(-D)
D = D[idx]
U = U[:,idx]
print(D, '\n')
print(U)
plt.figure(figsize = (10,8))
plt.stem(np.sqrt(D))
plt.grid(alpha = 0.3)
plt.show()
# relative magnitutes of the principal components
Z = X*U
xp = np.arange(0, m)/24 # 24 frame rate
plt.figure(figsize = (10, 8))
plt.plot(xp, Z)
plt.yticks([])
plt.show()
## projected onto the first principal component
# 6 dim -> 1 dim (dim reduction)
# relative magnitute of the first principal component
Z = X*U[:,0]
plt.figure(figsize = (10, 8))
plt.plot(Z)
plt.yticks([])
plt.show()
Reference: John P Cunningham & Byron M Yu, Dimensionality reduction for large-scale neural recordings, Nature Neuroscience 17, 1500–1509 (2014)
%%html
<center><iframe src="https://www.youtube.com/embed/OpYwMXvBGLo?rel=0"
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/hgA3to0KNBY?rel=0"
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/DmaiB5Kr36M?rel=0"
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/NXIr6eleakI?rel=0"
width="420" height="315" frameborder="0" allowfullscreen></iframe></center>
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')