Singular Value Decomposition (SVD)

By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

# 1. Geometry of Linear MapsÂ¶

Matrix $A$ (or linear transformation) = rotate + stretch/compress

an extremely important fact:
every matrix $A \in \mathbb{R}^{m \times n}$ maps the unit ball in $\mathbb{R}^n$ to an ellipsoid in $\mathbb{R}^m$

\begin{align*} S &= \{ x \in \mathbb{R}^n \mid \; \Vert x \Vert \leq 1 \}\\ AS &= \{Ax \mid x\in \mathbf{S}\} \end{align*}

# 2. Singular Values and Singular VectorsÂ¶

• the numbers $\sigma_1, \cdots, \sigma_n$ are called the singular values of $A$ by convention, $\sigma_i > 0$

• the vectors $u_1, \cdots, u_n$ these are unit vectors along the principal semiaxes of $AS$

• the vectors $v_1, \cdots, v_n$ these are the preimages of the principal semiaxes, defined so that

$$A v_i = \sigma_i u_i$$

## 2.1. Graphical ExplanationÂ¶

\begin{align*} A\begin{bmatrix} v_1& v_2 & \cdots & v_r \end{bmatrix} & =\begin{bmatrix} A v_1 & A v_2 & \cdots & A v_r \end{bmatrix} \\ & =\begin{bmatrix} \sigma_1u_1& \sigma_2u_2 & \cdots & \sigma_ru_r \end{bmatrix} \\ & =\begin{bmatrix} u_1& u_2 & \cdots & u_r \end{bmatrix}\begin{bmatrix} \sigma_1& & & \\ & \sigma_2 & & \\ & & \ddots & \\ & & & \sigma_r \end{bmatrix} \\ \\ \therefore \; AV & = U\Sigma \quad (r \leq m, n) \end{align*}

## 2.2. Thin Singular Value DecompositionÂ¶

We have $A \in \mathbb{R}^{m \times n}$, skinny and full rank (i.e., $r = n$), and

$$A v_i = \sigma_i u_i \quad \text{for } 1 \leq i \leq n$$

let

\begin{align*} \hat U &= \begin{bmatrix}u_1 & u_2 & \cdots & u_n \end{bmatrix}\\ \\ \quad \hat \Sigma &= \begin{bmatrix}\sigma_1& & & \\ & \sigma_2 &&\\ &&\ddots&\\ &&&\sigma_n \end{bmatrix}\\ \\ \quad V &= \begin{bmatrix} v_1 & v_2 & \cdots & v_n \end{bmatrix} \end{align*}

in matrix form, the above equation is $AV = \hat U \hat \Sigma$ and since $V$ is orthogonal

$$A = \hat U \hat \Sigma V^T$$

called the thin (or reduced) SVD of $A$

## 2.3. Full Singular Value DecompositionÂ¶

We can add extra orthonormal columns to $U$ to make an orthogonal matrix $(m \geq n)$

$$U = \begin{bmatrix} u_1 & u_2 & \cdots & u_m \end{bmatrix}$$

We also add extra rows of zeros to $\hat \Sigma$, so

$$A = U\Sigma V^T$$

This is the (full) singular value decomposition of $A$

• Every matrix $A$ has a singular value decomposition.

• If $A$ is not full rank, then some of the diagonal entries of $\Sigma$ will be zero

## 2.4. Interpretation of SVDÂ¶

The SVD decomposes the linear map into

• rotate by $V^T$

• diagonal scaling by $\sigma_i$

• rotate by $U$

Note that, unlike the eigen-decomposition, input and output directions are different

## 2.5. SVD: Matrix factorizationÂ¶

• for any matrix $A$

$$A = U\Sigma V^T$$

• for symmetric and positive definite matrix $A$

$$A = S\Lambda S^{-1} = S\Lambda S^T\quad (S: \text{eigenvectors})$$

## 2.6. ExampleÂ¶

\begin{array}{Icr} A = \begin{bmatrix} 4 & 4 \\ -3 & 3 \end{bmatrix} \end{array} \quad \quad \begin{array}{Icr} \begin{align*} &\textbf{orthonormal basis} \\ &v_1, v_2 \text{ in row space } \mathbb{R}^2 \\ &u_1, u_2 \text{ in column space } \mathbb{R}^2 \\ &\sigma_1 > 0 \\ &\sigma_2 > 0 \end{align*} \end{array}

$$\begin{array}{lcr} A v_1 = \sigma_1 u_1\\ A v_2 = \sigma_2 u_2 \end{array} \; \implies \; \begin{array}{lcr} AV = U\Sigma \\ A = U\Sigma V^{-1} = U\Sigma V^T \end{array}$$

1) $A^TA$: (small) positive definite & symmetric

$$A^TA = V\Sigma^TU^TU\Sigma V^T = V\Sigma^T\Sigma V^T = V\begin{bmatrix} \sigma_1^2& & \\ & \sigma_2^2 & \\ & & \ddots \\ \end{bmatrix}V^T$$

InÂ [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

InÂ [2]:
A = np.array([[4, 4],
[-3, 3]])

A = np.asmatrix(A)

[D, V] = np.linalg.eig(A.T*A)

print(V, '\n')
print(D)

[[ 0.70710678 -0.70710678]
[ 0.70710678  0.70710678]]

[32. 18.]


2) $AA^T$: (large) positive definite & symmetric

$$AA^T = U\Sigma V^TV\Sigma^TU^T = U\Sigma\Sigma^TU^T = U\begin{bmatrix} \sigma_1^2& & \\ & \sigma_2^2 & \\ & & \ddots \\ \end{bmatrix}U^T$$

InÂ [3]:
[D, U] = np.linalg.eig(A*A.T)

print(U, '\n')
print(D)

[[1. 0.]
[0. 1.]]

[32. 18.]


$$A = U\Sigma V^T$$

InÂ [4]:
[U, S, VT] = np.linalg.svd(A, full_matrices = True)

print(U.shape, S.shape, VT.shape, '\n')
print(U, '\n')
print(S, '\n')
print(VT)

(2, 2) (2,) (2, 2)

[[-1.  0.]
[ 0.  1.]]

[5.65685425 4.24264069]

[[-0.70710678 -0.70710678]
[-0.70710678  0.70710678]]


# 3. PCA and SVDÂ¶

• Any real symmetric and positive definite matrix $B$ has a eigen decomposition

$$B = S\Lambda S^T$$

• A real matrix $(m\times n)$ $A$, where $m>n$, has the decompsition

$$A = U\Sigma V^T$$

• From $A$ (skinny and full rank) we can construct two positive-definite symmetric matrices, $AA^T$ and $A^TA$

\begin{align*} AA^T &= U\Sigma V^T V \Sigma^T U^T = U \Sigma \Sigma^T U^T \quad (m \times m, \,n\, \text{eigenvalues and }\, m-n \; \text{zeros},\; U\; \text{eigenvectors})\\ A^TA &= V \Sigma^T U^T U\Sigma V^T = V \Sigma^T \Sigma V^T \quad (n \times n, \,n \;\text{eigenvalues},\; V\; \text{eigenvectors}) \end{align*}

• PCA by SVD

$$B = A^TA = V \Sigma^T \Sigma V^T = V \Lambda V^T$$
• $V$ is eigenvectors of $B = A^TA$
• $\sigma_i^2 = \lambda_i$
• Note that in PCA, $$VS V^T \quad \left(\text{where }S = \frac{1}{m}X^TX = \left(\frac{X}{\sqrt{m}}\right)^T \frac{X}{\sqrt{m}} = A^TA \right)$$

InÂ [5]:
%%html
<center><iframe src="https://www.youtube.com/embed/Nx0lRBaXoz4?rel=0"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>


# 4. Low Rank Approximation: Dimension ReductionÂ¶

## 4.1. Low Rank ApproximationÂ¶

• in 2D case

\begin{align*} x & = c_1 v_1 + c_2 v_2 \\ & = \langle x, v_1 \rangle v_1 + \langle x, v_2 \rangle v_2\\ & = ( v_1^{T} x) v_1 + ( v_2^T x) v_2 \\ \\ Ax & = c_1 A v_1 + c_2 A v_2 \\ & = c_1\sigma_1 u_1 + c_2 \sigma_2 u_2 \\ & = u_1 \sigma_1 v_1^Tx + u_2\sigma_2 v_2^T x \\ \\ & = \begin{bmatrix} u_1 & u_2 \end{bmatrix}\begin{bmatrix}\sigma_1 & 0\\ 0 & \sigma_2 \end{bmatrix}\begin{bmatrix} v_1^T \\ v_2^T \end{bmatrix} x \\ \\ & = U \Sigma V^Tx \\\\ & \approx u_1\sigma_1 v_1^Tx \quad (\text{if } \sigma_1 \gg \sigma_2) \end{align*}

• Low rank approximation of matrix $A$ in general

\begin{align*} A &= U_r \Sigma_r V_r^T = \sum_{i=1}^{r}\sigma_i u_i v_i^T\\ \\ \tilde A &= U_k \Sigma_k V_k^T = \sum_{i=1}^{k}\sigma_i u_i v_i^T \qquad (k \leq r) \end{align*}

## 4.2. Example: MatrixÂ¶

InÂ [6]:
A = np.array([[11.08, 6.82, 1.76, -6.82],
[2.50, -1.01, -2.60, 1.19],
[-4.88, -5.07, -3.21, 5.20],
[-0.49, 1.52, 2.07, -1.66],
[-14.04, -12.40, -6.66, 12.65],
[0.27, -8.51, -10.19, 9.15],
[9.53, -9.84, -17.00, 11.00],
[-12.01, 3.64, 11.10, -4.48]])


InÂ [7]:
[U, S, VT] = np.linalg.svd(A, full_matrices = True)

U = np.asmatrix(U)
S = np.asmatrix(np.diag(S))
VT = np.asmatrix(VT)

print(U.shape, S.shape, V.shape, '\n')
print(U, '\n')
print(S, '\n')
print(VT)

(8, 8) (4, 4) (2, 2)

[[-0.24642369  0.44536234  0.6205749   0.32739909  0.45915182  0.05410587
-0.18650735  0.00956801]
[ 0.07432183  0.10750102  0.28337342 -0.77803389 -0.10352762  0.32547077
-0.42245245  0.04655432]
[ 0.21369502 -0.1903207   0.49492174  0.11043549 -0.4736735  -0.61498898
-0.24132713 -0.01233476]
[-0.08223154 -0.02472167  0.20057294  0.06152361 -0.27047805  0.29819169
0.20131309 -0.86371786]
[ 0.50375686 -0.55378628  0.13739629 -0.02351069  0.61002316  0.01682912
-0.07593043 -0.20479838]
[ 0.43719294  0.03496575 -0.05496539  0.50172005 -0.29664823  0.55199001
-0.3572128   0.18055856]
[ 0.59019451  0.42657675  0.21078205 -0.13667122 -0.03298713 -0.00394525
0.6243326   0.1252985 ]
[-0.2967926  -0.51321848  0.42787997  0.0231769  -0.14042827  0.34499254
0.40598717  0.40166774]]

[[3.68258450e+01 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 2.62368811e+01 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 2.20213307e-02 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 5.10429562e-03]]

[[-0.03564832 -0.5381828  -0.61426544  0.5759917 ]
[ 0.92076022  0.16617467 -0.32597333 -0.13538086]
[-0.13920696 -0.49222525 -0.31155848 -0.80079151]
[-0.36269992  0.66367127 -0.6475729  -0.09294376]]


InÂ [8]:
[U, S, VT] = np.linalg.svd(A, full_matrices = False)

U = np.asmatrix(U)
S = np.asmatrix(np.diag(S))
VT = np.asmatrix(VT)

print(U.shape, S.shape, V.shape, '\n')
print(U, '\n')
print(S, '\n')
print(VT)

(8, 4) (4, 4) (2, 2)

[[-0.24642369  0.44536234  0.6205749   0.32739909]
[ 0.07432183  0.10750102  0.28337342 -0.77803389]
[ 0.21369502 -0.1903207   0.49492174  0.11043549]
[-0.08223154 -0.02472167  0.20057294  0.06152361]
[ 0.50375686 -0.55378628  0.13739629 -0.02351069]
[ 0.43719294  0.03496575 -0.05496539  0.50172005]
[ 0.59019451  0.42657675  0.21078205 -0.13667122]
[-0.2967926  -0.51321848  0.42787997  0.0231769 ]]

[[3.68258450e+01 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 2.62368811e+01 0.00000000e+00 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 2.20213307e-02 0.00000000e+00]
[0.00000000e+00 0.00000000e+00 0.00000000e+00 5.10429562e-03]]

[[-0.03564832 -0.5381828  -0.61426544  0.5759917 ]
[ 0.92076022  0.16617467 -0.32597333 -0.13538086]
[-0.13920696 -0.49222525 -0.31155848 -0.80079151]
[-0.36269992  0.66367127 -0.6475729  -0.09294376]]


Question

How to determine a good $k$ so that we can efficiently compress $A$ without losing too much information of $A$?

InÂ [9]:
sig_val = np.diag(S)

plt.figure(figsize = (8,6))
plt.stem(range(1,5), sig_val)
plt.xticks(range(1,5))
plt.title('Singular Values')
plt.show()

InÂ [10]:
k = 2

low_U = U[:, :k]
low_S = S[:k, :k]
low_VT = VT[:k, :]

A_approx = low_U*low_S*low_VT

print(low_U.shape, low_S.shape, low_VT.shape, '\n')
print('A:\n', A ,'\n')
print('A_approx:\n', A_approx, '\n')
print('low_U:\n', low_U, '\n')
print('low_VT:\n', low_VT)

(8, 2) (2, 2) (2, 4)

A:
[[ 11.08   6.82   1.76  -6.82]
[  2.5   -1.01  -2.6    1.19]
[ -4.88  -5.07  -3.21   5.2 ]
[ -0.49   1.52   2.07  -1.66]
[-14.04 -12.4   -6.66  12.65]
[  0.27  -8.51 -10.19   9.15]
[  9.53  -9.84 -17.    11.  ]
[-12.01   3.64  11.1   -4.48]]

A_approx:
[[ 11.08250851   6.8256176    1.76533991  -6.80890115]
[  2.49942829  -1.00429274  -2.60062751   1.19462804]
[ -4.87827835  -5.06500943  -3.20623934   5.20878009]
[ -0.48927124   1.52196569   2.07157948  -1.65643381]
[-14.03962233 -12.39843105  -6.65913505  12.65241176]
[  0.27076035  -8.51229541 -10.18871873   9.14926874]
[  9.53039313  -9.83725225 -16.99900559  11.0036522 ]
[-12.00864542   3.64455947  11.10301226  -4.47244356]]

low_U:
[[-0.24642369  0.44536234]
[ 0.07432183  0.10750102]
[ 0.21369502 -0.1903207 ]
[-0.08223154 -0.02472167]
[ 0.50375686 -0.55378628]
[ 0.43719294  0.03496575]
[ 0.59019451  0.42657675]
[-0.2967926  -0.51321848]]

low_VT:
[[-0.03564832 -0.5381828  -0.61426544  0.5759917 ]
[ 0.92076022  0.16617467 -0.32597333 -0.13538086]]


## 4.3. Example: Image ApproximationÂ¶

• Learn how to reduce the amount of storage needed for an image using the singular value decomposition.
• Approximation of $A$

$$\tilde A = U_k \Sigma_k V_k^T = \sum_{i=1}^{k}\sigma_i u_i v_i^T \qquad (k \leq r)$$

Downlaod the image file

InÂ [11]:
import cv2

img = cv2.imread('./data_files/postech.jpg', 0)
print(img.shape)

(642, 642)

InÂ [12]:
plt.figure(figsize = (8,8))
plt.imshow(img, 'gray')
plt.axis('off')
plt.show()