Linear Algebra


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Linear Equations

  • Set of linear equations (two equations, two unknowns)
$$ \begin{align*} 4x_{1} − 5x_{2} &= −13\\ −2x_{1} + 3x_{2} &= 9 \end{align*} $$

1.1. Solving Linear Equations

  • Two linear equations
$$ \begin{align*} 4x_{1} − 5x_{2} &= −13\\ −2x_{1} + 3x_{2} &= 9 \end{align*} $$
  • In vector form, $Ax = b$, with
$$A = \begin{bmatrix} 4 & -5 \\ -2 & 3 \end{bmatrix} , \quad x = \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} , \quad b = \begin{bmatrix} -13 \\ 9 \end{bmatrix} $$
  • Solution using inverse
$$ \begin{align*} Ax &= b \\ A^{-1}Ax &= A^{-1}b \\ x &= A^{-1}b \end{align*} $$
  • Won’t worry here about how to compute inverse, but it’s very siminp.linargr to the standard method for solving linear equations

  • We will use a numpy to compute

In [1]:
import numpy as np
In [2]:
A = np.array([[4, -5],[-2, 3]])
print(A)
[[ 4 -5]
 [-2  3]]
In [3]:
b = np.array([[-13],[9]])
print(b)
[[-13]
 [  9]]

$A^{-1} b$

In [4]:
x = np.linalg.inv(A).dot(b)
print(x)
[[ 3.]
 [ 5.]]
In [5]:
A = np.asmatrix(A)
b = np.asmatrix(b)
In [6]:
x = A.I*b
print(x)
[[ 3.]
 [ 5.]]
In [7]:
A = np.array([[4, -5],
              [-2, 3]])
b = np.array([[-13],
              [9]])

x = np.linalg.inv(A).dot(b)
print(x)
[[ 3.]
 [ 5.]]
In [8]:
A = np.asmatrix(A)
b = np.asmatrix(b)

x = A.I*b
print(x)
[[ 3.]
 [ 5.]]

1.2. System of Linear Equations

  • Consider system of linear equations
$$ \begin{align*} y_1 &= a_{11}x_{1} + a_{12}x_{2} + \cdots + a_{1n}x_{n} \\ y_2 &= a_{21}x_{1} + a_{22}x_{2} + \cdots + a_{2n}x_{n} \\ &\, \vdots \\ y_m &= a_{m1}x_{1} + a_{m2}x_{2} + \cdots + a_{mn}x_{n} \end{align*} $$
  • Can be written in a matrix form as $y = Ax$, where


$$ y= \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{m} \end{bmatrix} \qquad A = \begin{bmatrix} a_{11}&a_{12}&\cdots&a_{1n} \\ a_{21}&a_{22}&\cdots&a_{2n} \\ \vdots&\vdots&\ddots&\vdots\\ a_{m1}&a_{m2}&\cdots&a_{mn} \\ \end{bmatrix} \qquad x= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix} $$

1.3. Elements of a Matrix

  • Can write a matrix in terms of its columns
$$A = \begin{bmatrix} \mid&\mid&&\mid\\ a_{1} & a_{2} & \cdots & a_{n}\\ \mid&\mid&&\mid\\ \end{bmatrix} $$


  • Careful, $a_{i}$ here corresponds to an entire vector $a_{i} \in \mathbb{R}^{m}$, not an element of a vector
  • Can write a matrix in terms of rows
$$A = \begin{bmatrix} - & b_{1}^T& - \\ - & b_{2}^T& - \\ &\vdots& \\ - & b_{m}^T& - \end{bmatrix} $$
  • $b_{i} \in \mathbb{R}^{n}$

1.4. Vector-Vector Products

  • Inner product: $x, y \in \mathbb{R}^{n}$
$$x^{T}y = \sum\limits_{i=1}^{n}x_{i}\,y_{i} \quad \in \mathbb{R} $$
In [9]:
x = np.array([[1],
              [1]])

y = np.array([[2],
              [3]])

print(x.T.dot(y))
[[5]]
In [10]:
x = np.asmatrix(x)
y = np.asmatrix(y)

print(x.T*y)
[[5]]
In [11]:
z = x.T*y

print(z.A)
[[5]]

1.5. Matrix-Vector Products

  • $A \in \mathbb{R}^{m \times n}, x \in \mathbb{R}^{n} \Longleftrightarrow Ax \in \mathbb{R}^{m}$
  • Writing $A$ by rows, each entry of $Ax$ is an inner product between $x$ and a row of $A$

$$A = \begin{bmatrix} - &b_{1}^{T} & - \\ -& b_{2}^{T}&- \\ &\vdots& \\ -& b_{m}^{T}&- \end{bmatrix} ,\qquad Ax \in \mathbb{R}^{m} = \begin{bmatrix} b_{1}^{T}x \\ b_{2}^{T}x \\ \vdots \\ b_{m}^{T}x \end{bmatrix} $$


  • Writing $A$ by columns, $Ax$ is a linear combination of the columns of $A$, with coefficients given by $x$

$$A = \begin{bmatrix} \mid&\mid&&\mid\\ a_{1} & a_{2} & \cdots & a_{n}\\ \mid&\mid&&\mid\\ \end{bmatrix} ,\qquad Ax \in \mathbb{R}^{m} = \sum\limits_{i=1}^{n}a_{i}x_{i}$$

2. Norms (strenth or distance in linear space)

  • A vector norm is any function $f : \mathbb{R}^{n} \rightarrow \mathbb{R}$ with

    1. $f(x) \geq 0 \;$ and $\;f(x) = 0 \quad \Longleftrightarrow \quad x = 0$
    2. $f(ax) = \lvert a \rvert f(x) \;$ for $\; a \in \mathbb{R}$
    3. $f(x + y) \leq f(x) + f(y)$


  • $l_{2}$ norm
$$\left\lVert x \right\rVert _{2} = \sqrt{\sum\limits_{i=1}^{n}x_{i}^2}$$
  • $l_{1}$ norm
$$\left\lVert x \right\rVert _{1} = \sum\limits_{i=1}^{n} \left\lvert x_{i} \right\rvert$$
  • $\lVert x\rVert$ measures length of vector (from origin)
In [12]:
x = np.array([[4],
              [3]])

np.linalg.norm(x, 2)
Out[12]:
5.0
In [13]:
np.linalg.norm(x, 1)
Out[13]:
7.0

2.1. Orthogonality

  • Two vectors $x, y \in \mathbb{R}^n$ are orthogonal if

    $$x^Ty = 0$$

  • They are orthonormal if, in addition,

    $$\lVert x \rVert _{2} = \lVert y \rVert _{2} = 1 $$

In [14]:
x = np.matrix([[1],[2]])
y = np.matrix([[2],[-1]])
In [15]:
x.T*y
Out[15]:
matrix([[0]])

2.2. Angle between Vectors

  • For any $x, y \in \mathbb{R}^n, \lvert x^Ty \rvert \leq \lVert x \rVert \, \lVert y \rVert$
  • (Unsigned) angle between vectors in $\mathbb{R}^n$ defined as
$$ \begin{align*} \theta &= \angle(x,y) = \cos^{-1}\frac{x^Ty}{\lVert x \rVert \lVert y \rVert}\\ \\ \text{thus}\; x^Ty &= \lVert x \rVert \lVert y\rVert \cos\theta \end{align*} $$



  • $\{ x \mid x^Ty \leq 0\} $ defines a halfspace with outward normal vector $y$, and boundary passing through 0



3. Matrix and Linear Transformation

  • Vector
$$ \vec x = \begin{bmatrix} x_{1}\\ x_{2}\\ x_{3} \end{bmatrix} $$



  • Matrix and Transformation
$$ M= \begin{bmatrix} m_{11} & m_{12} & m_{13}\\ m_{21} & m_{22} & m_{23}\\ m_{31} & m_{32} & m_{33}\\ \end{bmatrix} $$


$$\begin{array}\ \vec y& = &M \vec x\\ \begin{bmatrix}\space \\ \space \\ \space \end{bmatrix} & = &\begin{bmatrix} & & \\ & & \\ & &\end{bmatrix}\begin{bmatrix} \space \\ \space \\ \space \end{bmatrix} \end{array} $$


$$\begin{array}\ \qquad \quad \text{Given} & & \qquad \text{Interpret}\\ \text{linear transformation} & \longrightarrow & \text{matrix}\\ \text{matrix} & \longrightarrow & \text{linear transformation}\\ \end{array} $$


$$\begin{array}{c}\ \vec x\\ \text{input} \end{array} \begin{array}{c}\ \quad \text{linear transformation}\\ \implies \end{array} \quad \begin{array}{l} \vec y\\ \text{output} \end{array}$$


$$\text{transformation} =\text{rotate + stretch/compress}$$

3.1. Linear Transformation

  • Superposition


$$T(x_1+x_2) = T(x_1)+T(x_2)$$
  • Homogeneity



$$T(kx) = kT(x)$$
  • Linear vs. Non-linear
$$\begin{array}{c}\\ \text{linear}& & \text{non-linear}\\ &\\ f(x) = 0 & & f(x) = x + c\\ f(x) = kx & & f(x) = x^2\\ f(x(t)) = \frac{dx(t)}{dt} & & f(x) = \sin x\\ f(x(t)) = \int_{a}^{b} x(t)dt & & \end{array}$$

3.2. Rotation

  • Rotation : $R(\theta)$



$$ \vec y = R(\theta) \vec x$$
  • Find matrix $R(\theta)$
$$ \begin{bmatrix} \cos(\theta)\\ \sin(\theta) \end{bmatrix}= R(\theta) \begin{bmatrix} 1\\ 0 \end{bmatrix}$$
$$ \begin{bmatrix} -\sin(\theta)\\ \cos(\theta) \end{bmatrix}= R(\theta) \begin{bmatrix} 0\\ 1 \end{bmatrix} $$



$$\begin{array}\\ \Longrightarrow &\begin{bmatrix} \cos(\theta)& -\sin(\theta)\\ \sin(\theta)& \cos(\theta) \end{bmatrix}& =& R(\theta) \begin{bmatrix} 1& 0\\ 0& 1 \end{bmatrix}& =& R(\theta)\\ \\ \\ &\begin{array}\\ M\vec{x}_1 = \vec{y}_1\\ M\vec{x}_2 = \vec{y}_2\\ \end{array}& =& M \begin{bmatrix} \vec{x}_1 & \vec{x}_2 \end{bmatrix}& =& \begin{bmatrix} \vec{y}_1 & \vec{y}_2 \end{bmatrix} \end{array}$$
In [16]:
import numpy as np

theta = 90/180*np.pi
R = np.matrix([[np.cos(theta), -np.sin(theta)],
               [np.sin(theta), np.cos(theta)]])
x = np.matrix([[1],[0]])

y = R*x
print(y)
[[  6.12323400e-17]
 [  1.00000000e+00]]

3.3. Stretch/Compress

  • Stretch/Compress (keep the direction)



$$\begin{array}\\ \vec y = &k\vec x\\ & \uparrow\\ & \text{scalar (not matrix)}\\ \\ \vec y = &k I \vec x & \text{where } I = \text{ Identity martix}\\ \\ \vec y = &\begin{bmatrix}k&0\\0&k\end{bmatrix}\vec x \end{array}$$

Example

$T$: stretch $a$ along $\hat x$-direction & stretch $b$ along $\hat y$-direction

Compute the corresponding matrix $A$

$$\vec y = A \vec x$$



$$\begin{array}\\ \begin{bmatrix}ax_1\\ bx_2\end{bmatrix}& = A\begin{bmatrix}x_1\\ x_2\end{bmatrix} \Longrightarrow A = \,?\\\\ & = \begin{bmatrix}a & 0\\ 0 & b\end{bmatrix}\begin{bmatrix}x_1\\ x_2\end{bmatrix} \end{array}$$



$$\begin{array}\\ A\begin{bmatrix}1\\0\end{bmatrix} & = \begin{bmatrix}a\\0\end{bmatrix} \\ A\begin{bmatrix}0\\1\end{bmatrix} & = \begin{bmatrix}0\\b\end{bmatrix} \\\\ A\begin{bmatrix}1 & 0\\ 0 &1\end{bmatrix} & = A = \begin{bmatrix}a & 0\\0 & b\end{bmatrix} \\ \end{array}$$

More importantly, by looking at $A = \begin{bmatrix}a & 0\\0 & b\end{bmatrix}$, can you think of transformation $T$?

3.4. Projection

  • $P$: Projection onto $\hat x$ - axis
$$\begin{array}{c}\\ & P & \\ \begin{bmatrix}x_1\\x_2\end{bmatrix} & \implies & \begin{bmatrix}x_1\\ 0\end{bmatrix}\\ \vec x & & \vec y \end{array}$$





$$\vec y = P\vec x = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} x_1 \\ 0 \end{bmatrix}$$



$$ \begin{array}\\ P \begin{bmatrix} 1 \\ 0 \end{bmatrix} & = \begin{bmatrix} 1 \\ 0 \end{bmatrix}\\ P \begin{bmatrix} 0 \\ 1 \end{bmatrix} & = \begin{bmatrix} 0 \\ 0 \end{bmatrix}\\\\ P \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} & = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} \end{array} $$

In [17]:
import numpy as np

P = np.matrix([[1, 0],
               [0, 0]])
x = np.matrix([[1],[1]])

y = P*x
print(y)
[[1]
 [0]]

3.5. Linear Transformation

  • If $\vec {v}_1$ and $\vec {v}_2$ are basis, and we know $T(\vec {v}_1) = \vec {\omega}_1$ and $T(\vec {v}_2) = \vec {\omega}_2$. Then, for any $\vec x$


$$\begin{array}{l} \vec x & = a_1\vec v_1 + a_2\vec v_2 & (a_1 \;\text{and } a_2 \;\text{unique})\\ \\ T(\vec x) & = T(a_1\vec v_1 + a_2\vec v_2) \\ & = a_1T(\vec v_1) + a_2T(\vec v_2)\\ & = a_1\vec {\omega}_1 + a_2\vec {\omega}_2\\ \end{array}$$


  • This is why a linear system makes our life much easier

  • Only thing that we need is to observe how basis are linearly-transformed

4. Eigenvalue and Eigenvector



$$ A \vec v = \lambda \vec v$$


$$ \begin{array}\\ \lambda & = &\begin{cases} \text{positive}\\ 0\\ \text{negative} \end{cases}\\ \lambda \vec v & : & \text{stretched vector}\\ &&\text{(same direction with } \vec v)\\ A \vec v & : &\text{linearly-transformed vector}\\ &&(\text{generally rotate + stretch}) \end{array}$$


  • Intuitive interpretation of eigen-analysis
$$A \vec v \text{ parallel to } \vec v$$


  • If $\vec {v}_1$ and $\vec {v}_2$ are basis and eigenvectors, and we know $T(\vec {v}_1) = \vec {\omega}_1 = \lambda_1 \vec{v}_1$ and $T(\vec {v}_2) = \vec {\omega}_2 = \lambda_2 \vec{v}_2$. Then, for any $\vec x$


$$\begin{array}{l} \vec x & = a_1\vec v_1 + a_2\vec v_2 & (a_1 \;\text{and } a_2 \;\text{unique})\\ \\ T(\vec x) & = T(a_1\vec v_1 + a_2\vec v_2) \\ & = a_1T(\vec v_1) + a_2T(\vec v_2)\\ & = a_1 \lambda_1\vec {v}_1 + a_2 \lambda_2 \vec {v}_2\\ & = \lambda_1 a_1 \vec {v}_1 + \lambda_2 a_2 \vec {v}_2\\ \end{array}$$


  • Only thing that we need is to observe how each basis is independently scaled

4.1. How to Compute Eigenvalue & Eigenvector



$$ A \vec{v} = \lambda \vec v = \lambda I \vec v$$$$ \begin{array} \implies & A\vec v - \lambda I \vec v = (A - \lambda I)\vec v = 0\\ \\ \implies & A - \lambda I = 0 \text{ or }\\ &\vec v = 0 \text{ or } \\ & (A - \lambda I)^{-1} \text{ does not exist }\\ \\ \implies & \text{det}(A - \lambda I) = 0 \end{array} $$
  • We can use its definition for eigen-analysis

Example

$A = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}$ : projection onto $\hat x$- axis

Find eigenvalues and eigenvectors.



$$ \vec y = \begin{bmatrix} 0 \\ 0 \end{bmatrix} = A \vec x = 0 \cdot \vec x$$ $$\lambda_1 = 0 \space \;\text{and}\; \space \vec {v}_1 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}$$


$$ \vec y = \begin{bmatrix} 1 \\ 0 \end{bmatrix} = A \vec x = 1 \cdot \vec x$$ $$\lambda_2 = 1 \space \;\text{and}\; \space \vec {v}_2 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}$$
In [18]:
import numpy as np

A = np.array([[1, 0],
              [0, 0]])
D, V = np.linalg.eig(A)

print('D :', D)
print('V :', V)
D : [ 1.  0.]
V : [[ 1.  0.]
 [ 0.  1.]]

Example

Projection onto the plane. Find eigenvalues and eigenvectors.



For any $\vec x$ in the plane, $P\vec x = \vec x \Rightarrow \lambda = 1$

For any $\vec x$ perpendicular to the plane, $P\vec x = \vec 0 \Rightarrow \lambda = 0$

5. System of Linear Equations

  1. well-determined linear systems
  2. under-determined linear systems
  3. over-determined linear systems

5.1. Well-Determined Linear Systems

  • System of linear equations
$$\begin{array}{c}\ 2x_1 + 3x_2 & = 7\\ x_1 + 4x_2 & = 6 \end{array} \; \implies \; \begin{array}{l}\begin{align*} x_1^{*} & = 2\\ x_2^{*} & = 1 \end{align*}\end{array}$$
  • Geometric point of view
  • Matrix form
$$\begin{array}{c}\ a_{11}x_1 + a_{12}x_2 = b_1\\ a_{21}x_1 + a_{22}x_2 = b_2 \end{array} \begin{array}{c}\ \quad \text{Matrix form}\\ \implies \end{array} \quad \begin{array}{l} \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = \begin{bmatrix} b_{1}\\ b_{2} \end{bmatrix} \end{array}$$


\begin{array}{l} \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \large AX = B \end{array}\begin{array}{l} \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \therefore & X^{*} = A^{-1} B \quad\text{ if } A^{-1} \text{ exists} \end{array}

5.2. Under-Determined Linear Systems

  • System of linear equations
$$\begin{array}{c}\ 2x_1 + 3x_2 = 7 \end{array} \; \implies \; \begin{array}{l}\begin{align} \text{Many solutions} \end{align}\end{array}$$
  • Geometric point of view


  • Matrix form


$$\begin{array}{c}\ a_{11}x_1 + a_{12}x_2 = b_1 \end{array} \begin{array}{c}\ \quad \text{Matrix form}\\ \implies \end{array} \quad \begin{array}{l} \begin{bmatrix} a_{11} & a_{12} \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = b_{1}\\ \end{array}$$


\begin{array}{l} \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \large AX = B \end{array}

\begin{array}{l} \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \therefore \; \text{ Many Solutions when $A$ is fat} \end{array}

5.3. Over-Determined Linear Systems

  • System of linear equations
$$\begin{array}{c}\ 2x_1 + 3x_2 & = 7\\ x_1 + 4x_2 & = 6\\ x_1 + x_2 & = 4 \end{array} \; \implies \; \begin{array}{l}\begin{align} \text{No solutions} \end{align}\end{array}$$
  • Geometric point of view


  • Matrix form
$$\begin{array}{c}\ a_{11}x_1 + a_{12}x_2 = b_1\\ a_{21}x_1 + a_{22}x_2 = b_2\\ a_{31}x_1 + a_{32}x_2 = b_3\\ \end{array} \begin{array}{c}\ \quad \text{Matrix form}\\ \implies \end{array} \quad \begin{array}{l} \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\\ a_{31} & a_{32}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = \begin{bmatrix} b_{1}\\ b_{2}\\ b_{3} \end{bmatrix} \end{array}$$


\begin{array}{l} \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \large AX = B \end{array}

\begin{array}{l} \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \therefore \; \text{ No Solutions when $A$ is skinny} \end{array}

5.4. Summary of Linear Systems


$$\large AX = B$$


  • Square: Well-determined Linear Systems


$$ \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = \begin{bmatrix} b_{1}\\ b_{2}\\ \end{bmatrix} $$

  • Fat: Under-determined Linear Systems


$$ \begin{bmatrix} a_{11} & a_{12}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}=b_1 $$

  • Skinny: Over-determined Linear Systems


$$ \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\\ a_{31} & a_{32}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = \begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \end{bmatrix} $$

6. Optimization Point of View

6.1. Least-Norm Solution

  • For under-determined linear system

    $$ \begin{bmatrix} a_{11}&a_{12}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = b_1 \quad\text{ or }\quad AX = B$$
    Find the solution of $AX = B$ that minimizes $\lVert X \rVert$ or $\lVert X \rVert^2$

    i.e., optimization problem
$$\begin{align*} \min \; & \; \lVert X \rVert ^2\\ \text{s. t.} \; & AX = B \end{align*}$$
  • Geometric point of view



  • Select one solution among many solutions
$$X^{*} = A^T \left( AA^T \right)^{-1}B \quad \text{Least norm solution}$$
  • Often control problem

6.2. Least-Square Solution

  • For over-determined linear system

    $$ \begin{align*} \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\\ a_{31} & a_{32} \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} &\neq \begin{bmatrix} b_{1}\\ b_{2}\\ b_{3} \end{bmatrix} \quad\text{ or }\quad AX \neq B \\ \\ x_1 \begin{bmatrix} a_{11} \\ a_{21} \\ a_{31} \end{bmatrix} + x_2 \begin{bmatrix} a_{12} \\ a_{22} \\ a_{32} \end{bmatrix} &\neq \begin{bmatrix} b_{1}\\ b_{2}\\ b_{3} \end{bmatrix} \end{align*} $$
    Find $X$ that minimizes $\lVert E \rVert$ or $\lVert E \rVert^2$

    i.e. optimization problem

$$ \begin{align*} \min\limits_{X}{\lVert E\rVert}^2 & = \min\limits_{X}{\lVert AX - B\rVert}^2\\ X^{*} & = \left( A^TA \right)^{-1}A^TB\\ B^{*} = AX^{*} & = A\left( A^TA \right)^{-1}A^TB \end{align*} $$
  • Geometric point of view





  • Often estimation problem

6.3. Geometic Point of View: Projection

6.3.1. Vector Projection

  • The vector projection of a vector $X$ on (or onto) a nonzero vector $Y$ is the orthogonal projection of $X$ onto a straight line parallel to $Y$





$$ \begin{align*} W & = \omega\hat{Y}= \omega \frac{Y}{\lVert Y \rVert}, \;\text{where } \omega = \lVert W \rVert\\ \omega & = \lVert X \rVert \cos \theta = \lVert X \rVert \frac{X \cdot Y}{\lVert X \rVert \lVert Y \rVert} = \frac{X \cdot Y}{\lVert Y \rVert}\\ W & = \omega \hat{Y} = \frac{X \cdot Y}{\lVert Y \rVert}\frac{Y}{\lVert Y \rVert} = \frac{X \cdot Y}{\lVert Y \rVert \lVert Y \rVert}Y = \frac{X^T Y}{Y^T Y}Y = \frac{\langle X, Y \rangle}{\langle Y, Y \rangle}Y\\ & = Y\frac{X^T Y}{Y^T Y} = Y\frac{Y^T X}{Y^T Y} = \frac{YY^T}{Y^T Y}X = PX \end{align*} $$
  • Another way of computing $\omega$ and $W$


$$ \begin{align*} Y & \perp \left( X - W \right)\\ \implies & Y^T \left( X - W \right) = Y^T \left( X - \omega \frac{Y}{\lVert Y \rVert} \right) = 0\\ \implies & \omega = \frac{Y^T X}{Y^T Y}\lVert Y \rVert\\ & W = \omega \frac{Y}{\lVert Y \rVert} = \frac{Y^TX}{Y^TY}Y = \frac{\langle X, Y \rangle}{\langle Y, Y \rangle}Y \end{align*} $$

In [19]:
import numpy as  np

X = np.matrix([[1],[1]])
Y = np.matrix([[2],[0]])

print(X)
print(Y)
[[1]
 [1]]
[[2]
 [0]]
In [20]:
print(Y.T*Y)
[[4]]
In [21]:
omega = (X.T*Y)/(Y.T*Y)

print(float(omega))
0.5
In [22]:
omega = float(omega)
W = omega*Y
print(W)
[[ 1.]
 [ 0.]]

6.3.2. Orthogonal Projection onto a Subspace

  • Projection of $B$ onto a subspace $U$ of span of $A_1$ and $A_2$

  • Orthogonality


$$A \perp \left( AX^{*}-B\right)$$$$ \begin{align*} A^T \left(AX^{*} - B \right) & = 0\\ A^TAX^{*} & = A^TB\\ X^{*} & = \left( A^TA \right)^{-1}A^TB\\ B^{*} = AX^{*} & = A\left( A^TA \right)^{-1}A^TB \end{align*} $$




In [23]:
import numpy as  np

A = np.matrix([[1,0],[0,1],[0,0]])
B = np.matrix([[1],[1],[1]])

X = (A.T*A).I*A.T*B
print(X)

Bstar = A*X
print(Bstar)
[[ 1.]
 [ 1.]]
[[ 1.]
 [ 1.]
 [ 0.]]

6.3.3. Towards Minimization Problems

  • Suppose $U$ is a subspace of $W$ and $\omega \in W$. Then
$$\lVert \omega - P_U\omega \rVert^2 \leq \lVert \omega - u \rVert$$

$\quad$ for every $u \in U$. Furthermore, if $u \in U$ and the inequality above is an equality, then $u = P_u\omega$




In [24]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')