Singular Value Decomposition (SVD)
Table of Contents
Matrix $A$ (or linear transformation) = rotate + stretch/compress
the numbers $\sigma_1, \cdots, \sigma_n$ are called the singular values of $A$ by convention, $\sigma_i > 0$
the vectors $u_1, \cdots, u_n$ these are unit vectors along the principal semiaxes of $AS$
the vectors $v_1, \cdots, v_n$ these are the preimages of the principal semiaxes, defined so that
We have $A \in \mathbb{R}^{m \times n}$, skinny and full rank (i.e., $r = n$), and
let
in matrix form, the above equation is $AV = \hat U \hat \Sigma$ and since $V$ is orthogonal
called the thin (or reduced) SVD of $A$
We can add extra orthonormal columns to $U$ to make an orthogonal matrix $(m \geq n)$
We also add extra rows of zeros to $\hat \Sigma$, so
This is the (full) singular value decomposition of $A$
Every matrix $A$ has a singular value decomposition.
If $A$ is not full rank, then some of the diagonal entries of $\Sigma$ will be zero
The SVD decomposes the linear map into
rotate by $V^T$
diagonal scaling by $\sigma_i$
rotate by $U$
Note that, unlike the eigen-decomposition, input and output directions are different
1) $A^TA$: (small) positive definite & symmetric
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
A = np.array([[4, 4],
[-3, 3]])
A = np.asmatrix(A)
[D, V] = np.linalg.eig(A.T*A)
print(V, '\n')
print(D)
2) $AA^T$: (large) positive definite & symmetric
[D, U] = np.linalg.eig(A*A.T)
print(U, '\n')
print(D)
[U, S, VT] = np.linalg.svd(A, full_matrices = True)
print(U.shape, S.shape, VT.shape, '\n')
print(U, '\n')
print(S, '\n')
print(VT)
%%html
<center><iframe src="https://www.youtube.com/embed/Nx0lRBaXoz4?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
A = np.array([[11.08, 6.82, 1.76, -6.82],
[2.50, -1.01, -2.60, 1.19],
[-4.88, -5.07, -3.21, 5.20],
[-0.49, 1.52, 2.07, -1.66],
[-14.04, -12.40, -6.66, 12.65],
[0.27, -8.51, -10.19, 9.15],
[9.53, -9.84, -17.00, 11.00],
[-12.01, 3.64, 11.10, -4.48]])
[U, S, VT] = np.linalg.svd(A, full_matrices = True)
U = np.asmatrix(U)
S = np.asmatrix(np.diag(S))
VT = np.asmatrix(VT)
print(U.shape, S.shape, V.shape, '\n')
print(U, '\n')
print(S, '\n')
print(VT)
[U, S, VT] = np.linalg.svd(A, full_matrices = False)
U = np.asmatrix(U)
S = np.asmatrix(np.diag(S))
VT = np.asmatrix(VT)
print(U.shape, S.shape, V.shape, '\n')
print(U, '\n')
print(S, '\n')
print(VT)
Question
How to determine a good $k$ so that we can efficiently compress $A$ without losing too much information of $A$?
sig_val = np.diag(S)
plt.figure(figsize = (8,6))
plt.stem(range(1,5), sig_val)
plt.xticks(range(1,5))
plt.title('Singular Values')
plt.show()
k = 2
low_U = U[:, :k]
low_S = S[:k, :k]
low_VT = VT[:k, :]
A_approx = low_U*low_S*low_VT
print(low_U.shape, low_S.shape, low_VT.shape, '\n')
print('A:\n', A ,'\n')
print('A_approx:\n', A_approx, '\n')
print('low_U:\n', low_U, '\n')
print('low_VT:\n', low_VT)
Downlaod the image file
import cv2
img = cv2.imread('./data_files/postech.jpg', 0)
print(img.shape)
plt.figure(figsize = (8,8))
plt.imshow(img, 'gray')
plt.axis('off')
plt.show()
A = img
[U, S, VT] = np.linalg.svd(A)
U = np.asmatrix(U)
S = np.asmatrix(np.diag(S))
VT = np.asmatrix(VT)
print(U.shape, S.shape, VT.shape, '\n')
print(U, '\n')
print(S, '\n')
print(VT)
sig_val = np.diag(S)
plt.figure(figsize = (8,6))
plt.stem(sig_val)
plt.title('Singular Values')
plt.show()
Approximated image when k = 20
k = 20
low_U = U[:, :k]
low_S = S[:k, :k]
low_VT = VT[:k, :]
A_hat = low_U*low_S*low_VT
plt.figure(figsize = (16, 8))
plt.subplot(1,2,1), plt.imshow(A, 'gray'), plt.axis('off'), plt.title('Original image')
plt.subplot(1,2,2), plt.imshow(A_hat, 'gray'), plt.axis('off'), plt.title('Approximated image w/ rank = 20')
plt.show()
Approximated images with k varied
K = [1, 2, 3, 4, 10, 20, 30, 600]
plt.figure(figsize = (16, 8))
for i in range(len(K)):
low_U = U[:, :K[i]]
low_S = S[:K[i], :K[i]]
low_VT = VT[:K[i], :]
A_hat = low_U*low_S*low_VT
plt.subplot(2,4,i+1), plt.imshow(A_hat, 'gray'), plt.axis('off')
plt.title('rank {}'.format(K[i]))
plt.show()
Principal Components and Proper Orthogonal Modes (POM)
Mathematically, we can consider a given function $f(x,t)$ over a domain of interest. In most applications, the variables $x$ and $t$ will refer to the standard space-time variables we are familiar with. The idea is to then expand this function in some basis representation so that
where $k$ is the total number of modes, $\phi_i(x)$, to be used. The remaining function $c_i(t)$ determines the weights of the spatial modes.
This is certainly not a new idea to us. Indeed, we have been using this concept extensively already with Fourier transforms, for instance. Speciļ¬cally, here are some of the more common expansion bases used in practice:
Interestingly enough, the basis functions selected are chosen often for simplicity and/or their intuitive meaning for the system. For instance, the Fourier transform very clearly highlights the fact that the given function has a representation in terms of frequency components.
In contrast to selecting basis functions for simplicity or intuitive meaning, the broader question can be asked: what criteria should be used for selecting the functions $\phi_i(x)$. This is an interesting question given that any complete basis can approximate the function $f(x,t)$ to any desired accuracy given $k$ large enough. But what is desired is the best basis functions such that, with $k$ as small as possible, we achieve the desired accuracy.
The goal is then the following: ļ¬nd a sequence of orthonormal basis functions $\phi_i(x)$ so that ļ¬rst two terms gives the best two-term approximation to $f(x,t)$, or the ļ¬rst ļ¬ve terms gives the best ļ¬ve-term approximation to $f(x,t)$, for instance.
These special, ordered, orthonormal functions are called the proper orthogonal modes (POD) for the function $f(x,t)$.
Further editted by Tim Chartier, Davidson College 2007
Best approximation of $x$ with $U$ eigenface (eigenvectors)
12 USA Presidents
Download the face file
from scipy import io
faces = io.loadmat('./data_files/faces.mat')
face_mat = faces['faces']
row, col, m = face_mat.shape
plt.figure(figsize = (12, 16))
for i in range(m):
plt.subplot(4, 3, i+1), plt.axis('off')
plt.imshow(face_mat[:, :, i], 'gray')
plt.show()
SVD
A = np.reshape(face_mat, [row*col, m])
[U, S, VT] = np.linalg.svd(A, full_matrices = False)
U = np.asmatrix(U)
S = np.asmatrix(np.diag(S))
VT = np.asmatrix(VT)
print(U.shape, S.shape, VT.shape, '\n')
Eigenface
plt.figure(figsize = (12, 16))
for i in range(m):
plt.subplot(4, 3, i+1), plt.axis('off')
plt.imshow(U[:, i].reshape(row, col), 'gray')
plt.show()
Singular values
sig_val = np.diag(S)
plt.figure(figsize = (8,6))
plt.stem(sig_val)
plt.title('Singular Values')
plt.show()
Transmit the compressed information
obama = face_mat[:,:,11]
plt.figure(figsize = (6,8))
plt.imshow(obama, 'gray')
plt.axis('off')
plt.show()
# sender
x = np.reshape(obama, [row*col, 1])
eigenface_decom = U.T*x
print(obama.shape, eigenface_decom.shape)
# reciever
reconst = U*eigenface_decom
plt.figure(figsize = (6,8))
plt.imshow(reconst.reshape(row, col), 'gray')
plt.axis('off')
plt.show()
Acquire new image
jfk = cv2.imread('./data_files/jfk.png', 0)
plt.figure(figsize = (6,8))
plt.imshow(jfk, 'gray')
plt.axis('off')
plt.show()
Projection onto $U$
x = np.reshape(jfk, [row*col, 1])
eigenface_decom = U.T*x
reconst = U*eigenface_decom
plt.figure(figsize = (6,8))
plt.imshow(reconst.reshape(row, col), 'gray')
plt.axis('off')
plt.show()
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')