Singular Value Decomposition (SVD)

By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Geometry of Linear Maps¶

Matrix $A$ (or linear transformation) = rotate + stretch/compress

an extremely important fact:
every matrix $A \in \mathbb{R}^{m \times n}$ maps the unit ball in $\mathbb{R}^n$ to an ellipsoid in $\mathbb{R}^m$

$$ \begin{align*} S &= \{ x \in \mathbb{R}^n \mid \; \Vert x \Vert \leq 1 \}\\ AS &= \{Ax \mid x\in \mathbf{S}\} \end{align*} $$

2. Singular Values and Singular Vectors¶

the numbers $\sigma_1, \cdots, \sigma_n$ are called the singular values of $A$ by convention, $\sigma_i > 0$
the vectors $u_1, \cdots, u_n$ these are unit vectors along the principal semiaxes of $AS$
the vectors $v_1, \cdots, v_n$ these are the preimages of the principal semiaxes, defined so that

$$ A v_i = \sigma_i u_i$$

2.1. Graphical Explanation¶

$$\begin{align*} A\begin{bmatrix} v_1& v_2 & \cdots & v_r \end{bmatrix} & =\begin{bmatrix} A v_1 & A v_2 & \cdots & A v_r \end{bmatrix} \\ & =\begin{bmatrix} \sigma_1u_1& \sigma_2u_2 & \cdots & \sigma_ru_r \end{bmatrix} \\ & =\begin{bmatrix} u_1& u_2 & \cdots & u_r \end{bmatrix}\begin{bmatrix} \sigma_1& & & \\ & \sigma_2 & & \\ & & \ddots & \\ & & & \sigma_r \end{bmatrix} \\ \\ \therefore \; AV & = U\Sigma \quad (r \leq m, n) \end{align*}$$

2.2. Thin Singular Value Decomposition¶

We have $A \in \mathbb{R}^{m \times n}$, skinny and full rank (i.e., $r = n$), and

$$A v_i = \sigma_i u_i \quad \text{for } 1 \leq i \leq n$$

let

$$ \begin{align*} \hat U &= \begin{bmatrix}u_1 & u_2 & \cdots & u_n \end{bmatrix}\\ \\ \quad \hat \Sigma &= \begin{bmatrix}\sigma_1& & & \\ & \sigma_2 &&\\ &&\ddots&\\ &&&\sigma_n \end{bmatrix}\\ \\ \quad V &= \begin{bmatrix} v_1 & v_2 & \cdots & v_n \end{bmatrix} \end{align*} $$

in matrix form, the above equation is $AV = \hat U \hat \Sigma$ and since $V$ is orthogonal

$$ A = \hat U \hat \Sigma V^T $$

called the thin (or reduced) SVD of $A$

2.3. Full Singular Value Decomposition¶

We can add extra orthonormal columns to $U$ to make an orthogonal matrix $(m \geq n)$

$$ U = \begin{bmatrix} u_1 & u_2 & \cdots & u_m \end{bmatrix}$$

We also add extra rows of zeros to $\hat \Sigma$, so

$$ A = U\Sigma V^T$$

This is the (full) singular value decomposition of $A$

Every matrix $A$ has a singular value decomposition.
If $A$ is not full rank, then some of the diagonal entries of $\Sigma$ will be zero

2.4. Interpretation of SVD¶

The SVD decomposes the linear map into

rotate by $V^T$
diagonal scaling by $\sigma_i$
rotate by $U$

Note that, unlike the eigen-decomposition, input and output directions are different

2.5. SVD: Matrix factorization¶

for any matrix $A$

$$ A = U\Sigma V^T $$

for symmetric and positive definite matrix $A$

$$A = S\Lambda S^{-1} = S\Lambda S^T\quad (S: \text{eigenvectors})$$

2.6. Example¶

$$\begin{array}{Icr} A = \begin{bmatrix} 4 & 4 \\ -3 & 3 \end{bmatrix} \end{array} \quad \quad \begin{array}{Icr} \begin{align*} &\textbf{orthonormal basis} \\ &v_1, v_2 \text{ in row space } \mathbb{R}^2 \\ &u_1, u_2 \text{ in column space } \mathbb{R}^2 \\ &\sigma_1 > 0 \\ &\sigma_2 > 0 \end{align*} \end{array} $$

$$ \begin{array}{lcr} A v_1 = \sigma_1 u_1\\ A v_2 = \sigma_2 u_2 \end{array} \; \implies \; \begin{array}{lcr} AV = U\Sigma \\ A = U\Sigma V^{-1} = U\Sigma V^T \end{array}$$

1) $A^TA$: (small) positive definite & symmetric

$$A^TA = V\Sigma^TU^TU\Sigma V^T = V\Sigma^T\Sigma V^T = V\begin{bmatrix} \sigma_1^2& & \\ & \sigma_2^2 & \\ & & \ddots \\ \end{bmatrix}V^T$$

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

A = np.array([[4, 4], 
              [-3, 3]])

A = np.asmatrix(A)

[D, V] = np.linalg.eig(A.T*A)

print(V, '\n')
print(D)

[[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]] 

[32. 18.]

2) $AA^T$: (large) positive definite & symmetric

$$AA^T = U\Sigma V^TV\Sigma^TU^T = U\Sigma\Sigma^TU^T = U\begin{bmatrix} \sigma_1^2& & \\ & \sigma_2^2 & \\ & & \ddots \\ \end{bmatrix}U^T$$

[D, U] = np.linalg.eig(A*A.T)

print(U, '\n')
print(D)

[[1. 0.]
 [0. 1.]] 

[32. 18.]

$$ A = U\Sigma V^T $$

[U, S, VT] = np.linalg.svd(A, full_matrices = True)

print(U.shape, S.shape, VT.shape, '\n')
print(U, '\n')
print(S, '\n')
print(VT)

(2, 2) (2,) (2, 2) 

[[-1.  0.]
 [ 0.  1.]] 

[5.65685425 4.24264069] 

[[-0.70710678 -0.70710678]
 [-0.70710678  0.70710678]]

3. PCA and SVD¶

Any real symmetric and positive definite matrix $B$ has a eigen decomposition

$$B = S\Lambda S^T$$

A real matrix $(m\times n)$ $A$, where $m>n$, has the decompsition

$$A = U\Sigma V^T$$

From $A$ (skinny and full rank) we can construct two positive-definite symmetric matrices, $AA^T$ and $A^TA$

$$\begin{align*} AA^T &= U\Sigma V^T V \Sigma^T U^T = U \Sigma \Sigma^T U^T \quad (m \times m, \,n\, \text{eigenvalues and }\, m-n \; \text{zeros},\; U\; \text{eigenvectors})\\ A^TA &= V \Sigma^T U^T U\Sigma V^T = V \Sigma^T \Sigma V^T \quad (n \times n, \,n \;\text{eigenvalues},\; V\; \text{eigenvectors}) \end{align*}$$

PCA by SVD

$$B = A^TA = V \Sigma^T \Sigma V^T = V \Lambda V^T$$

$V$ is eigenvectors of $B = A^TA$
$\sigma_i^2 = \lambda_i$
Note that in PCA, $$ VS V^T \quad \left(\text{where }S = \frac{1}{m}X^TX = \left(\frac{X}{\sqrt{m}}\right)^T \frac{X}{\sqrt{m}} = A^TA \right) $$

%%html
<center><iframe src="https://www.youtube.com/embed/Nx0lRBaXoz4?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

4. Low Rank Approximation: Dimension Reduction¶

4.1. Low Rank Approximation¶

in 2D case

$$ \begin{align*} x & = c_1 v_1 + c_2 v_2 \\ & = \langle x, v_1 \rangle v_1 + \langle x, v_2 \rangle v_2\\ & = ( v_1^{T} x) v_1 + ( v_2^T x) v_2 \\ \\ Ax & = c_1 A v_1 + c_2 A v_2 \\ & = c_1\sigma_1 u_1 + c_2 \sigma_2 u_2 \\ & = u_1 \sigma_1 v_1^Tx + u_2\sigma_2 v_2^T x \\ \\ & = \begin{bmatrix} u_1 & u_2 \end{bmatrix}\begin{bmatrix}\sigma_1 & 0\\ 0 & \sigma_2 \end{bmatrix}\begin{bmatrix} v_1^T \\ v_2^T \end{bmatrix} x \\ \\ & = U \Sigma V^Tx \\\\ & \approx u_1\sigma_1 v_1^Tx \quad (\text{if } \sigma_1 \gg \sigma_2) \end{align*}$$

Low rank approximation of matrix $A$ in general

$$\begin{align*} A &= U_r \Sigma_r V_r^T = \sum_{i=1}^{r}\sigma_i u_i v_i^T\\ \\ \tilde A &= U_k \Sigma_k V_k^T = \sum_{i=1}^{k}\sigma_i u_i v_i^T \qquad (k \leq r) \end{align*}$$

4.2. Example: Matrix¶

A = np.array([[11.08, 6.82, 1.76, -6.82],
              [2.50, -1.01, -2.60, 1.19],
              [-4.88, -5.07, -3.21, 5.20],
              [-0.49, 1.52, 2.07, -1.66],
              [-14.04, -12.40, -6.66, 12.65],
              [0.27, -8.51, -10.19, 9.15],
              [9.53, -9.84, -17.00, 11.00],
              [-12.01, 3.64, 11.10, -4.48]])

[U, S, VT] = np.linalg.svd(A, full_matrices = True)

U = np.asmatrix(U)
S = np.asmatrix(np.diag(S))
VT = np.asmatrix(VT)

print(U.shape, S.shape, V.shape, '\n')
print(U, '\n')
print(S, '\n')
print(VT)

(8, 8) (4, 4) (2, 2) 

[[-0.24642369  0.44536234  0.6205749   0.32739909  0.45915182  0.05410587
  -0.18650735  0.00956801]
 [ 0.07432183  0.10750102  0.28337342 -0.77803389 -0.10352762  0.32547077
  -0.42245245  0.04655432]
 [ 0.21369502 -0.1903207   0.49492174  0.11043549 -0.4736735  -0.61498898
  -0.24132713 -0.01233476]
 [-0.08223154 -0.02472167  0.20057294  0.06152361 -0.27047805  0.29819169
   0.20131309 -0.86371786]
 [ 0.50375686 -0.55378628  0.13739629 -0.02351069  0.61002316  0.01682912
  -0.07593043 -0.20479838]
 [ 0.43719294  0.03496575 -0.05496539  0.50172005 -0.29664823  0.55199001
  -0.3572128   0.18055856]
 [ 0.59019451  0.42657675  0.21078205 -0.13667122 -0.03298713 -0.00394525
   0.6243326   0.1252985 ]
 [-0.2967926  -0.51321848  0.42787997  0.0231769  -0.14042827  0.34499254
   0.40598717  0.40166774]] 

[[3.68258450e+01 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 2.62368811e+01 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 2.20213307e-02 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 5.10429562e-03]] 

[[-0.03564832 -0.5381828  -0.61426544  0.5759917 ]
 [ 0.92076022  0.16617467 -0.32597333 -0.13538086]
 [-0.13920696 -0.49222525 -0.31155848 -0.80079151]
 [-0.36269992  0.66367127 -0.6475729  -0.09294376]]

[U, S, VT] = np.linalg.svd(A, full_matrices = False)

U = np.asmatrix(U)
S = np.asmatrix(np.diag(S))
VT = np.asmatrix(VT)

print(U.shape, S.shape, V.shape, '\n')
print(U, '\n')
print(S, '\n')
print(VT)

(8, 4) (4, 4) (2, 2) 

[[-0.24642369  0.44536234  0.6205749   0.32739909]
 [ 0.07432183  0.10750102  0.28337342 -0.77803389]
 [ 0.21369502 -0.1903207   0.49492174  0.11043549]
 [-0.08223154 -0.02472167  0.20057294  0.06152361]
 [ 0.50375686 -0.55378628  0.13739629 -0.02351069]
 [ 0.43719294  0.03496575 -0.05496539  0.50172005]
 [ 0.59019451  0.42657675  0.21078205 -0.13667122]
 [-0.2967926  -0.51321848  0.42787997  0.0231769 ]] 

[[3.68258450e+01 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 2.62368811e+01 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 2.20213307e-02 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 5.10429562e-03]] 

[[-0.03564832 -0.5381828  -0.61426544  0.5759917 ]
 [ 0.92076022  0.16617467 -0.32597333 -0.13538086]
 [-0.13920696 -0.49222525 -0.31155848 -0.80079151]
 [-0.36269992  0.66367127 -0.6475729  -0.09294376]]

Question

How to determine a good $k$ so that we can efficiently compress $A$ without losing too much information of $A$?

sig_val = np.diag(S)

plt.figure(figsize = (8,6))
plt.stem(range(1,5), sig_val)
plt.xticks(range(1,5))
plt.title('Singular Values')
plt.show()

k = 2

low_U = U[:, :k]
low_S = S[:k, :k]
low_VT = VT[:k, :]

A_approx = low_U*low_S*low_VT

print(low_U.shape, low_S.shape, low_VT.shape, '\n')
print('A:\n', A ,'\n')
print('A_approx:\n', A_approx, '\n')
print('low_U:\n', low_U, '\n')
print('low_VT:\n', low_VT)

(8, 2) (2, 2) (2, 4) 

A:
 [[ 11.08   6.82   1.76  -6.82]
 [  2.5   -1.01  -2.6    1.19]
 [ -4.88  -5.07  -3.21   5.2 ]
 [ -0.49   1.52   2.07  -1.66]
 [-14.04 -12.4   -6.66  12.65]
 [  0.27  -8.51 -10.19   9.15]
 [  9.53  -9.84 -17.    11.  ]
 [-12.01   3.64  11.1   -4.48]] 

A_approx:
 [[ 11.08250851   6.8256176    1.76533991  -6.80890115]
 [  2.49942829  -1.00429274  -2.60062751   1.19462804]
 [ -4.87827835  -5.06500943  -3.20623934   5.20878009]
 [ -0.48927124   1.52196569   2.07157948  -1.65643381]
 [-14.03962233 -12.39843105  -6.65913505  12.65241176]
 [  0.27076035  -8.51229541 -10.18871873   9.14926874]
 [  9.53039313  -9.83725225 -16.99900559  11.0036522 ]
 [-12.00864542   3.64455947  11.10301226  -4.47244356]] 

low_U:
 [[-0.24642369  0.44536234]
 [ 0.07432183  0.10750102]
 [ 0.21369502 -0.1903207 ]
 [-0.08223154 -0.02472167]
 [ 0.50375686 -0.55378628]
 [ 0.43719294  0.03496575]
 [ 0.59019451  0.42657675]
 [-0.2967926  -0.51321848]] 

low_VT:
 [[-0.03564832 -0.5381828  -0.61426544  0.5759917 ]
 [ 0.92076022  0.16617467 -0.32597333 -0.13538086]]

4.3. Example: Image Approximation¶

Learn how to reduce the amount of storage needed for an image using the singular value decomposition.

Approximation of $A$

$$\tilde A = U_k \Sigma_k V_k^T = \sum_{i=1}^{k}\sigma_i u_i v_i^T \qquad (k \leq r)$$

Downlaod the image file

import cv2

img = cv2.imread('./data_files/postech.jpg', 0)
print(img.shape)

(642, 642)

plt.figure(figsize = (8,8))
plt.imshow(img, 'gray')
plt.axis('off')
plt.show()

A = img

[U, S, VT] = np.linalg.svd(A)

U = np.asmatrix(U)
S = np.asmatrix(np.diag(S))
VT = np.asmatrix(VT)

print(U.shape, S.shape, VT.shape, '\n')
print(U, '\n')
print(S, '\n')
print(VT)

(642, 642) (642, 642) (642, 642) 

[[-0.05056924 -0.04981795  0.02191623 ... -0.06332669  0.02607014
  -0.01198101]
 [-0.05058442 -0.04975868  0.02183727 ...  0.06690459 -0.03290307
  -0.00625734]
 [-0.05060117 -0.04961593  0.02185365 ... -0.04061354  0.12324206
   0.02296389]
 ...
 [-0.04781458 -0.03290448  0.01364876 ...  0.00269351 -0.01030524
  -0.0110923 ]
 [-0.04799566 -0.03305308  0.01349296 ... -0.00713524  0.01524358
   0.01091434]
 [-0.04790926 -0.03471368  0.01449735 ...  0.00852267 -0.00585473
  -0.01242731]] 

[[1.16316669e+05 0.00000000e+00 0.00000000e+00 ... 0.00000000e+00
  0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 1.80615470e+04 0.00000000e+00 ... 0.00000000e+00
  0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 8.13201766e+03 ... 0.00000000e+00
  0.00000000e+00 0.00000000e+00]
 ...
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 ... 5.45275026e-02
  0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 ... 0.00000000e+00
  2.24722778e-02 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 ... 0.00000000e+00
  0.00000000e+00 3.53870387e-03]] 

[[-0.02951407 -0.02949865 -0.02913627 ... -0.03058476 -0.03086415
  -0.03085373]
 [-0.05960193 -0.06201721 -0.06635743 ... -0.06074852 -0.05992027
  -0.05637177]
 [ 0.07513872  0.06844283  0.06366888 ... -0.00464834  0.00745156
   0.01806998]
 ...
 [-0.00435315 -0.0084308   0.0005949  ... -0.00380025  0.00096562
  -0.00079913]
 [-0.01208917  0.00819502 -0.01608132 ... -0.00778557  0.00111754
   0.00128831]
 [-0.01294559 -0.00190322  0.003923   ...  0.01304349 -0.00929396
  -0.01072873]]

sig_val = np.diag(S)

plt.figure(figsize = (8,6))
plt.stem(sig_val)
plt.title('Singular Values')
plt.show()

Approximated image when k = 20

k = 20

low_U = U[:, :k]
low_S = S[:k, :k]
low_VT = VT[:k, :]

A_hat = low_U*low_S*low_VT

plt.figure(figsize = (16, 8))
plt.subplot(1,2,1), plt.imshow(A, 'gray'), plt.axis('off'), plt.title('Original image')
plt.subplot(1,2,2), plt.imshow(A_hat, 'gray'), plt.axis('off'), plt.title('Approximated image w/ rank = 20')
plt.show()

Approximated images with k varied

K = [1, 2, 3, 4, 10, 20, 30, 600]

plt.figure(figsize = (16, 8))

for i in range(len(K)):
    low_U = U[:, :K[i]]
    low_S = S[:K[i], :K[i]]
    low_VT = VT[:K[i], :]

    A_hat = low_U*low_S*low_VT
    
    plt.subplot(2,4,i+1), plt.imshow(A_hat, 'gray'), plt.axis('off')    
    plt.title('rank {}'.format(K[i]))
    
plt.show()

5. Proper Orthogonal Modes (POM)¶

Principal Components and Proper Orthogonal Modes (POM)

the principal component analysis seems to suggest that we are simply expanding our solution in another orthonormal basis, one which can always diagonalize the underlying system.

Mathematically, we can consider a given function $f(x,t)$ over a domain of interest. In most applications, the variables $x$ and $t$ will refer to the standard space-time variables we are familiar with. The idea is to then expand this function in some basis representation so that

$$f(x,t) \approx \sum_{i=1}^{k}c_i(t) \phi_i(x)$$

where $k$ is the total number of modes, $\phi_i(x)$, to be used. The remaining function $c_i(t)$ determines the weights of the spatial modes.

This is certainly not a new idea to us. Indeed, we have been using this concept extensively already with Fourier transforms, for instance. Speciﬁcally, here are some of the more common expansion bases used in practice:

$$ \begin{align*} \phi_i(x) &= (x-x_0)^i &\text{Taylor expansion}\\ \phi_i(x) &= e^{ix} &\text{Fourier transform}\\ \phi_i(x) &= \psi_{a,b}(x) &\text{Wavelet transform}\\ \phi_i(x) &= \phi_{\lambda_i}(x) &\text{Eigenfunction expansion}\\ \end{align*} $$

Interestingly enough, the basis functions selected are chosen often for simplicity and/or their intuitive meaning for the system. For instance, the Fourier transform very clearly highlights the fact that the given function has a representation in terms of frequency components.

In contrast to selecting basis functions for simplicity or intuitive meaning, the broader question can be asked: what criteria should be used for selecting the functions $\phi_i(x)$. This is an interesting question given that any complete basis can approximate the function $f(x,t)$ to any desired accuracy given $k$ large enough. But what is desired is the best basis functions such that, with $k$ as small as possible, we achieve the desired accuracy.

The goal is then the following: ﬁnd a sequence of orthonormal basis functions $\phi_i(x)$ so that ﬁrst two terms gives the best two-term approximation to $f(x,t)$, or the ﬁrst ﬁve terms gives the best ﬁve-term approximation to $f(x,t)$, for instance.

These special, ordered, orthonormal functions are called the proper orthogonal modes (POD) for the function $f(x,t)$.

5.1. Eigenface¶

http://en.wikipedia.org/wiki/Eigenface
Further editted by Tim Chartier, Davidson College 2007
Best approximation of $x$ with $U$ eigenface (eigenvectors)

$$ \tilde{x} = \sum_{i=1}^{k} \langle x,\hat{u}_i\rangle \,\hat{u}_i $$

12 USA Presidents

Download the face file

from scipy import io

faces = io.loadmat('./data_files/faces.mat')
face_mat = faces['faces']

row, col, m = face_mat.shape

plt.figure(figsize = (12, 16))

for i in range(m):
    plt.subplot(4, 3, i+1), plt.axis('off')
    plt.imshow(face_mat[:, :, i], 'gray')
    
plt.show()

SVD

A = np.reshape(face_mat, [row*col, m])

[U, S, VT] = np.linalg.svd(A, full_matrices = False)

U = np.asmatrix(U)
S = np.asmatrix(np.diag(S))
VT = np.asmatrix(VT)

print(U.shape, S.shape, VT.shape, '\n')

(50000, 12) (12, 12) (12, 12)

Eigenface

plt.figure(figsize = (12, 16))

for i in range(m):
    plt.subplot(4, 3, i+1), plt.axis('off')
    plt.imshow(U[:, i].reshape(row, col), 'gray')
    
plt.show()

Singular values

sig_val = np.diag(S)

plt.figure(figsize = (8,6))
plt.stem(sig_val)
plt.title('Singular Values')
plt.show()

Transmit the compressed information

Assum that both parties have eigenfunctions
Only thing to transmit is the corresponding weights of the eigenfunctions

obama = face_mat[:,:,11]

plt.figure(figsize = (6,8))
plt.imshow(obama, 'gray')
plt.axis('off')
plt.show()

# sender
x = np.reshape(obama, [row*col, 1])
eigenface_decom = U.T*x
print(obama.shape, eigenface_decom.shape)

(250, 200) (12, 1)

# reciever
reconst = U*eigenface_decom

plt.figure(figsize = (6,8))
plt.imshow(reconst.reshape(row, col), 'gray')
plt.axis('off')
plt.show()

5.2. Seeing through a Disguise with SVD¶

Acquire new image

original image disguised
download the image file

jfk = cv2.imread('./data_files/jfk.png', 0)

plt.figure(figsize = (6,8))
plt.imshow(jfk, 'gray')
plt.axis('off')
plt.show()

Projection onto $U$

x = np.reshape(jfk, [row*col, 1])

eigenface_decom = U.T*x
reconst = U*eigenface_decom

plt.figure(figsize = (6,8))
plt.imshow(reconst.reshape(row, col), 'gray')
plt.axis('off')
plt.show()

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')