Linear Algebra
Table of Contents
1. Linear Equations¶
from IPython.display import YouTubeVideo
YouTubeVideo('ub4oe2im8T8', width = "560", height = "315")
1.1. Solving Linear Equations¶
- Two linear equations (two equations, two unknowns)
$$ \begin{align*} 4x_{1} - 5x_{2} &= -13\\ -2x_{1} + 3x_{2} &= 9 \end{align*} $$
- In vector form, $Ax = b$, with
$$A = \begin{bmatrix} 4 & -5 \\ -2 & 3 \end{bmatrix} , \quad x = \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} , \quad b = \begin{bmatrix} -13 \\ 9 \end{bmatrix} $$
- Solution using matrix inverse
$$ \begin{align*} Ax &= b \\ A^{-1}Ax &= A^{-1}b \\ x &= A^{-1}b \end{align*} $$
Don't worry about how to compute the inverse manually.
Instead, we will use a
numpy
to compute it.
import numpy as np
A = np.array([[4, -5], [-2, 3]])
print(A)
b = np.array([[-13], [9]])
print(b)
$A^{-1} b$
x = np.linalg.inv(A).dot(b)
print(x)
If you are not familiar with linalg
, covert the data into a matrix form using asmatrix
A = np.asmatrix(A)
b = np.asmatrix(b)
x = A.I*b
print(x)
1.2. System of Linear Equations¶
- Consider system of linear equations
$$ \begin{align*} y_1 &= a_{11}x_{1} + a_{12}x_{2} + \cdots + a_{1n}x_{n} \\ y_2 &= a_{21}x_{1} + a_{22}x_{2} + \cdots + a_{2n}x_{n} \\ &\, \vdots \\ y_m &= a_{m1}x_{1} + a_{m2}x_{2} + \cdots + a_{mn}x_{n} \end{align*} $$
- Can be written in a matrix form as $y = Ax$, where
$$ y= \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{m} \end{bmatrix} \qquad A = \begin{bmatrix} a_{11}&a_{12}&\cdots&a_{1n} \\ a_{21}&a_{22}&\cdots&a_{2n} \\ \vdots&\vdots&\ddots&\vdots\\ a_{m1}&a_{m2}&\cdots&a_{mn} \\ \end{bmatrix} \qquad x= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix} $$
1.3. Elements of a Matrix¶
A matrix is not just an array of numbers. There are two ways to look at a matrix.
(1) As a collection of columns
- Can write a matrix in terms of its columns
$$A = \begin{bmatrix} \mid&\mid&&\mid\\ a_{1} & a_{2} & \cdots & a_{n}\\ \mid&\mid&&\mid\\ \end{bmatrix} $$
- Careful, $a_{i}$ here corresponds to a column vector $a_{i} \in \mathbb{R}^{m}$, not an element of a vector
(2) As a collection of rows
- Can write a matrix in terms of rows
$$A = \begin{bmatrix} - & b_{1}^T& - \\ - & b_{2}^T& - \\ &\vdots& \\ - & b_{m}^T& - \end{bmatrix} $$
- $b_{i} \in \mathbb{R}^{n}$
1.4. Vector-Vector Products¶
- Inner product: $x, y \in \mathbb{R}^{n}$
$$\langle x,y \rangle = x^{T}y = \sum\limits_{i=1}^{n}x_{i}\,y_{i} \quad \in \mathbb{R} $$
The inner product operator takes two column vectors as input and produces a scalar value as output.
The summation $\sum\limits_{i=1}^{n}x_{i}\,y_{i}$ calculates the product of corresponding components of $ x $ and $ y $ and sums them up.
The inner product can be thought of as a measure of similarity between two column vectors. We will discuss this in more detail later.
- In function spaces, the inner product extends to integrals:
$$\langle f, g \rangle = \int f(x) g(x) \,dx$$
x = np.array([[1],
[1]])
y = np.array([[2],
[3]])
print(x.T.dot(y))
x = np.asmatrix(x)
y = np.asmatrix(y)
print(x.T*y)
z = x.T*y
print(z.A)
1.5. Matrix-Vector Products¶
Given $A \in \mathbb{R}^{m \times n}$
$$ x \in \mathbb{R}^{n} \quad \Longrightarrow \quad Ax \in \mathbb{R}^{m}$$
Now, when we multiply $A$ and $x$:
$$Ax \in \mathbb{R}^m$$
which means the output is an $m$-dimensional vector.
Intuition
- Writing $A$ by rows, each entry of $Ax$ is an inner product between $x$ and a row of $A$
$$A = \begin{bmatrix} - &b_{1}^{T} & - \\ -& b_{2}^{T}&- \\ &\vdots& \\ -& b_{m}^{T}&- \end{bmatrix} ,\qquad Ax \in \mathbb{R}^{m} = \begin{bmatrix} b_{1}^{T}x \\ b_{2}^{T}x \\ \vdots \\ b_{m}^{T}x \end{bmatrix} $$
- Note: $b_i^T x$ represents the similarity between $b_i$ and $x$.
- Writing $A$ by columns, $Ax$ is a linear combination of the columns of $A$, with coefficients given by $x$
$$A = \begin{bmatrix} \mid&\mid&&\mid\\ a_{1} & a_{2} & \cdots & a_{n}\\ \mid&\mid&&\mid\\ \end{bmatrix} ,\qquad Ax \in \mathbb{R}^{m} = \sum\limits_{i=1}^{n}a_{i}x_{i}$$
- Note: $Ax$ is a linear combination of the columns of $A$, meaning it lies in the column space (or span of the columns) of $A$.
2. Norms (Strenth or Distance in Linear Space)¶
Question: We know how to compute the distance between two real numbers. Can we extend this concept to define the distance between two vectors?
Norm provides a general concept of distance in linear algebra. A norm is a function that assigns a non-negative length (or size) to a vector, helping us define "distance" in different ways.
- A vector norm is any function $f : \mathbb{R}^{n} \rightarrow \mathbb{R}$ with
- Non-negativity: $f(x) \geq 0 \;$ and $\;f(x) = 0 \quad \Longleftrightarrow \quad x = 0$
- Scalar multiplication: $f(ax) = \lvert a \rvert f(x) \;$ for $\; a \in \mathbb{R}$
- Triangle inequality: $f(x + y) \leq f(x) + f(y)$
- Non-negativity: $f(x) \geq 0 \;$ and $\;f(x) = 0 \quad \Longleftrightarrow \quad x = 0$
- $l_{2}$ norm
$$\left\lVert x \right\rVert _{2} = \sqrt{\sum\limits_{i=1}^{n}x_{i}^2} = \|x\|_2 = \left( \sum_{i=1}^{n} x_i^2 \right)^{\frac{1}{2}}$$
- $l_{1}$ norm
$$\left\lVert x \right\rVert _{1} = \sum\limits_{i=1}^{n} \left\lvert x_{i} \right\rvert$$
- $l_p$ norm
$$\|x\|_p = \left( \sum_{i=1}^{n} |x_i|^p \right)^{\frac{1}{p}}$$
- $l_{\infty}$ norm
$$\|x\|_{\infty} = \max_{i} |x_i|$$
- $\lVert x\rVert = \lVert x - 0\rVert$ measures length of vector (from origin)
- $\lVert x - y\rVert$ gives the distance between two vector $x$ and $y$
Question: Can we define the distance between two matrices?
x = np.array([[4],
[3]])
np.linalg.norm(x, 2)
np.linalg.norm(x, 1)
2.1. Orthogonality¶
- Two vectors $x, y \in \mathbb{R}^n$ are orthogonal if
$$x^Ty = 0$$
- They are orthonormal if, in addition,
$$\lVert x \rVert _{2} = \lVert y \rVert _{2} = 1 $$
- The concept of orthogonality is naturally defined in Euclidean space. However, can this idea be extended to functional spaces? In other words, can you conceptualize a notion of orthogonality between two functions?
$$\langle f, g \rangle = \int f(x) g(x) \,dx = 0$$
x = np.matrix([[1],[2]])
y = np.matrix([[2],[-1]])
x.T*y
2.2. Angle between Vectors¶
- For any $x, y \in \mathbb{R}^n$,
$$\quad \lvert x^Ty \rvert \leq \lVert x \rVert \, \lVert y \rVert$$
- (Unsigned) angle between vectors in $\mathbb{R}^n$ defined as
$$ \begin{align*} \theta &= \angle(x,y) = \cos^{-1}\frac{x^Ty}{\lVert x \rVert \lVert y \rVert}\\ \\ \text{thus}\qquad x^Ty &= \lVert x \rVert \lVert y\rVert \cos\theta \end{align*} $$
- Note:
- The concept of an angle is fundamentally derived from the notion of distance and
- it is closely related to the concept of similarity.
- In the context of functional spaces, it is important to explore how the idea of an angle can be formally defined between two functions.
- $\{ x \mid x^Ty \leq 0\} $ defines a halfspace with outward normal vector $y$, and boundary passing through 0
- Given a vector $y$, a linear space can be divided into two halves. This property plays a fundamental role in classification, where it is used to separate data points into distinct categories. We will explore this concept in more detail later.
3. Matrix and Linear Transformation¶
from IPython.display import YouTubeVideo
YouTubeVideo('-JFdkgJrkyc', width = "560", height = "315")
- Vector
$$ \vec x = \begin{bmatrix} x_{1}\\ x_{2}\\ x_{3} \end{bmatrix} $$
- Matrix and Transformation
- A matrix is a mathematical object that can be used to represent a linear transformation.
- A matrix is not just an array of numbers but a powerful tool for encoding and understanding linear transformations. By studying how matrices act on vectors, we gain a deeper insight into geometry, algebra, and real-world applications across various fields.
$$
M=
\begin{bmatrix}
m_{11} & m_{12} & m_{13}\\
m_{21} & m_{22} & m_{23}\\
m_{31} & m_{32} & m_{33}\\
\end{bmatrix}
$$
$$\begin{array}\ \vec y& = &M \vec x\\ \begin{bmatrix}\space \\ \space \\ \space \end{bmatrix} & = &\begin{bmatrix} & & \\ & & \\ & &\end{bmatrix}\begin{bmatrix} \space \\ \space \\ \space \end{bmatrix} \end{array} $$
$$\begin{array}\ \qquad \quad \text{Given} & & \qquad \text{Interpret}\\ \text{linear transformation} & \longrightarrow & \text{matrix}\\ \text{matrix} & \longrightarrow & \text{linear transformation}\\ \end{array} $$
$$\begin{array}{c}\ \vec x\\ \text{input} \end{array} \begin{array}{c}\ \quad \text{linear transformation}\\ \implies \end{array} \quad \begin{array}{l} \vec y\\ \text{output} \end{array}$$
$$\text{transformation} =\text{rotate + stretch/compress}$$
In the equation
$$y = Mx$$
- $x$ represents the input vector in the transformation
- $M$ is the transformation matrix that defines how $x$ is mapped to another space
- $y$ is the output vector, obtained by applying the linear transformation represented by $M$ to $x$
3.1. Linear Transformation¶
A linear transformation is a function $T: X \rightarrow Y$ between two vector spaces $X$ and $Y$ over the same field that satisfies two properties: superposition and homogeneity.
- Superposition (additivity)
$$T(x_1+x_2) = T(x_1)+T(x_2)$$
- Homogeneity
$$T(kx) = kT(x)$$
Common examples of linear transformations include:
- Rotation
- Reflection
- Scaling
- Projection
Linear vs. Non-linear
$$\begin{array}{c}\\ \text{linear}& & \text{non-linear}\\ &\\ f(x) = 0 & & f(x) = x + c\\ f(x) = kx & & f(x) = x^2\\ f(x(t)) = \frac{dx(t)}{dt} & & f(x) = \sin x\\ f(x(t)) = \int_{a}^{b} x(t)dt & & \end{array}$$
- The function $f(x)=x+c$ is not strictly linear, as it does not satisfy the condition $f(0)=0$. Instead, it is classified as an affine transformation, which is a more general form of linear mapping that includes a translation component.
$$ \vec y = R(\theta) \vec x$$
Let's find matrix $R(\theta)$
- If $x = \begin{bmatrix} 1 \\ 0 \end{bmatrix} $,
\begin{equation*} \begin{bmatrix} \cos(\theta)\\ \sin(\theta) \end{bmatrix}= R(\theta) \begin{bmatrix} 1\\ 0 \end{bmatrix} \end{equation*}
- If $x = \begin{bmatrix} 0 \\ 1 \end{bmatrix} $,
\begin{equation*} \begin{bmatrix} -\sin(\theta)\\ \cos(\theta) \end{bmatrix}= R(\theta) \begin{bmatrix} 0\\ 1 \end{bmatrix} \end{equation*}
$$\begin{array}\\ &\begin{array}\\ M\vec{x}_1 = \vec{y}_1\\ M\vec{x}_2 = \vec{y}_2\\ \end{array}& =& M \begin{bmatrix} \vec{x}_1 & \vec{x}_2 \end{bmatrix}& =& \begin{bmatrix} \vec{y}_1 & \vec{y}_2 \end{bmatrix}\\\\ \Longrightarrow &\begin{bmatrix} \cos(\theta)& -\sin(\theta)\\ \sin(\theta)& \cos(\theta) \end{bmatrix}&=& R(\theta) \begin{bmatrix} 1& 0\\ 0& 1 \end{bmatrix} & = & R(\theta)I \quad =\quad R(\theta)\\ \end{array}$$
Notes:
The method used to derive $R(\theta)$ here differs somewhat from what we have learned. However, it is a more direct approach to finding $R(\theta)$ based on its definition of rotation.
As shown, rotation is a linear transformation because it can be represented by a matrix $M$. However, at first glance, rotation may not intuitively appear to be linear. Nonetheless, since it satisfies the properties of additivity and homogeneity, rotation is indeed a linear transformation.
import numpy as np
theta = 90/180*np.pi
R = np.matrix([[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]])
x = np.matrix([[1],[0]])
y = R*x
print(y)
$$\begin{array}\\ \vec y = &k\vec x\\ & \uparrow\\ & \text{scalar (not matrix)}\\ \\ \vec y = &k I \vec x & \text{where } I = \text{ Identity martix}\\ \\ \vec y = &\begin{bmatrix}k&0\\0&k\end{bmatrix}\vec x \end{array}$$
Example
$T$: stretch $a$ along $\hat x$-direction & stretch $b$ along $\hat y$-direction
Compute the corresponding matrix $A$.
$$\vec y = A \vec x$$
$$\begin{array}\\ \begin{bmatrix}ax_1\\ bx_2\end{bmatrix}& = A\begin{bmatrix}x_1\\ x_2\end{bmatrix} \Longrightarrow A = \,?\\\\ & = \begin{bmatrix}a & 0\\ 0 & b\end{bmatrix}\begin{bmatrix}x_1\\ x_2\end{bmatrix} \end{array}$$
$$\begin{array}\\ A\begin{bmatrix}1\\0\end{bmatrix} & = \begin{bmatrix}a\\0\end{bmatrix} \\ A\begin{bmatrix}0\\1\end{bmatrix} & = \begin{bmatrix}0\\b\end{bmatrix} \\\\ A\begin{bmatrix}1 & 0\\ 0 &1\end{bmatrix} & = A = \begin{bmatrix}a & 0\\0 & b\end{bmatrix} \\ \end{array}$$
More importantly, by looking at $A = \begin{bmatrix}a & 0\\0 & b\end{bmatrix}$, can you determine what kind of corresponding linear transformation $T$ represents?
3.4. Projection¶
$P$: Projection onto $\hat x$ - axis
$$\begin{array}{c}\\ & P & \\ \begin{bmatrix}x_1\\x_2\end{bmatrix} & \implies & \begin{bmatrix}x_1\\ 0\end{bmatrix}\\ \vec x & & \vec y \end{array}$$
$$\vec y = P\vec x = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} x_1 \\ 0 \end{bmatrix}$$
$$
\begin{array}\\
P \begin{bmatrix} 1 \\ 0 \end{bmatrix} & = \begin{bmatrix} 1 \\ 0 \end{bmatrix}\\
P \begin{bmatrix} 0 \\ 1 \end{bmatrix} & = \begin{bmatrix} 0 \\ 0 \end{bmatrix}\\\\
P \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} & = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}
\end{array}
$$
Again, at first glance, projection may not intuitively appear to be linear. Nonetheless, since it satisfies the properties of additivity and homogeneity, projection is indeed a linear transformation.
Question: Provide a explanation, based on the definition of projection, for why the matrix $P$ is not invertible.
import numpy as np
P = np.matrix([[1, 0],
[0, 0]])
x = np.matrix([[1],[1]])
y = P*x
print(y)
3.5. Linear Transformation and Basis Representation¶
If $\vec {v}_1$ and $\vec {v}_2$ are basis, and we know $T(\vec {v}_1) = \vec {\omega}_1$ and $T(\vec {v}_2) = \vec {\omega}_2$.
Then, for any $\vec x$
$$\begin{array}{l} \vec x & = a_1\vec v_1 + a_2\vec v_2 & (a_1 \;\text{and } a_2 \;\text{unique})\\ \\ T(\vec x) & = T(a_1\vec v_1 + a_2\vec v_2) \\ & = a_1T(\vec v_1) + a_2T(\vec v_2)\\ & = a_1\vec {\omega}_1 + a_2\vec {\omega}_2\\ \end{array}$$
Explanation
Suppose that computing $T(\vec{x})$ for a new input $\vec{x}$ directly takes approximately 2 hours. However, if $\vec{\omega}_1 $ and $ \vec{\omega}_2 $ have already been precomputed offline, the output for a new input $ x $ can be obtained immediately.
Rather than computing $T(\vec{x})$ from scratch (a process that takes 2 hours), the transformation can be rapidly evaluated by simply taking the linear combination
$$a_1 \vec{\omega}_1 + a_2 \vec{\omega}_2$$
where the coefficients $a_1$ and $a_2$ are quickly and uniquely determined from the representation of $\vec{x}$ in the chosen basis. This approach significantly reduces the computational overhead by leveraging the precomputed outputs.
This is why a linear system greatly simplifies problem-solving.
The key insight is that we only need to observe how the basis vectors are transformed linearly.
4. Eigenvalue and Eigenvector¶
Eigen-analysis, which involves the study of eigenvalues and eigenvectors, plays a fundamental role in numerous engineering disciplines. Its frequent appearance across various domains is not incidental but rather a consequence of its deep mathematical and physical significance. Eigen-analysis provides insights into the inherent characteristics of linear transformations.
$$ A \vec v = \lambda \vec v$$
$$ \begin{array}\\ \lambda & = &\begin{cases} \text{positive}\\ 0\\ \text{negative} \end{cases}\\ \lambda \vec v & : & \text{stretched vector}\\ &&\text{(same direction with } \vec v)\\ A \vec v & : &\text{linearly-transformed vector}\\ &&(\text{generally rotate + stretch}) \end{array}$$
Intuitive interpretation of eigenvector:
$$A \vec v \; \text{ parallel to }\; \vec v$$
An eigenvector of a matrix $A$ is a nonzero vector $\vec{v}$ that remains in the same direction after the linear transformation $A\vec{v}$, though it may be scaled by a scalar $\lambda$.
Note:
At a high level, to understand the dynamics of $A$ (or the transformation represented by $A$), it is crucial to identify and analyze its invariants. In other words, to comprehend how things change, we must first understand what remains unchanged. Eigenvectors serve this purpose by revealing the directions that remain unaltered under transformation, while eigenvalues indicate the scaling effect along these directions. This fundamental principle underlies many applications in physics, engineering, and data science, where understanding invariant structures provides deeper insights into system behavior and stability.
Or more interpretively:
"While things compete and evolve, their fundamental nature remains unchanged."
This idiom conveys the idea that, despite external competition and transformation, there are underlying principles or essences that remain constant. It aligns well with the notion that to understand change, one must first recognize what is invariant.
4.1. Basis and Eigenvector Representation in Linear Transformation¶
- If $\vec {v}_1$ and $\vec {v}_2$ are basis and eigenvectors, and we know
$$ \begin{align*} T(\vec {v}_1) &= \vec {\omega}_1 = \lambda_1 \vec{v}_1 \\ T(\vec {v}_2) &= \vec {\omega}_2 = \lambda_2 \vec{v}_2 \end{align*} $$
- Then, for any $\vec x$
$$\begin{array}{l} \vec x & = a_1\vec v_1 + a_2\vec v_2 & (a_1 \;\text{and } a_2 \;\text{unique})\\ \\ T(\vec x) & = T(a_1\vec v_1 + a_2\vec v_2) \\ & = a_1T(\vec v_1) + a_2T(\vec v_2)\\ & = a_1 \lambda_1\vec {v}_1 + a_2 \lambda_2 \vec {v}_2\\ & = \lambda_1 a_1 \vec {v}_1 + \lambda_2 a_2 \vec {v}_2\\ \end{array}$$
- Only thing that we need is to observe how each basis is independently scaled
4.2. How to Compute Eigenvalue & Eigenvector¶
Typical method for computing eigenvalues and eigenvectors
$$A \vec{v} = \lambda \vec v = \lambda I \, \vec v$$
$$ \begin{array} \implies & A\vec v - \lambda I \vec v = (A - \lambda I)\vec v = 0\\ \\ \implies & A - \lambda I = 0 \text{ or }\\ &\vec v = 0 \text{ or } \\ & (A - \lambda I)^{-1} \text{ does not exist }\\ \\ \implies & \text{det}(A - \lambda I) = 0 \end{array} $$
Alternatively, we can use its definition for eigen-analysis to find the eigenvalues and eigenvectors of the linear transformation.
Given a matrix $A$, consider the corresponding linear transformation it represents. From this linear transformation, we can identify eigenvectors, which are direction-invariant vectors—meaning they remain in the same direction after the transformation, though they may be scaled by an eigenvalue.
Through the following examples, we will see how eigenvectors remain direction-invariant under a given linear transformation and how eigenvalues determine their scaling behavior.
4.2.1. Example: Projection¶
$A = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}$ : projection onto $\hat x$- axis. Find eigenvalues and eigenvectors.
- If $x = \begin{bmatrix} 0 \\ 1 \end{bmatrix} $,
\begin{equation*} \vec y = \begin{bmatrix} 0 \\ 0 \end{bmatrix} = A \vec x = 0 \cdot \vec x\\ \therefore \; \lambda_1 = 0 \space \;\text{and}\; \space \vec {v}_1 = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \end{equation*}
- If $x = \begin{bmatrix} 1 \\ 0 \end{bmatrix} $,
\begin{equation*} \vec y = \begin{bmatrix} 1 \\ 0 \end{bmatrix} = A \vec x = 1 \cdot \vec x\\ \therefore \; \lambda_2 = 1 \space \;\text{and}\; \space \vec {v}_2 = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \end{equation*}
import numpy as np
A = np.array([[1, 0],
[0, 0]])
D, V = np.linalg.eig(A)
print('D :', D)
print('V :', V)
For any $\vec x$ in the plane, $\; P\vec x = \vec x \; \implies \; \lambda = 1$
For any $\vec x$ perpendicular to the plane, $\; P\vec x = \vec 0 \; \implies \; \lambda = 0$
4.2.3. Example: Rotation¶
Eigen-analysis of a rotation transformation provides insight into how vectors behave under rotation in a given vector space. This analysis involves finding eigenvalues and eigenvectors of a rotation matrix, which describe the invariant directions and scaling factors of the transformation.
1. Rotation Matrix in 2D
A rotation matrix in two-dimensional space is given by:
$$ R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} $$
where $\theta$ is the angle of rotation.
The eigenvalues of $ R(\theta) $ are found by solving the characteristic equation:
$$ \begin{align*} \det(R(\theta) - \lambda I) &= 0 \\\\ \begin{vmatrix} \cos\theta - \lambda & -\sin\theta \\ \sin\theta & \cos\theta - \lambda \end{vmatrix} &= 0 \\\\ (\cos\theta - \lambda)^2 + \sin^2\theta &= 0\\\\ \lambda^2 - 2\lambda \cos\theta + 1 &= 0 \\\\ \lambda &= \cos\theta \pm i \sin\theta \end{align*} $$
This gives the two complex eigenvalues:
$$\lambda_1 = e^{i\theta}, \quad \lambda_2 = e^{-i\theta}$$
Since these eigenvalues have unit magnitude ($ |\lambda| = 1 $), they confirm that rotation preserves vector magnitudes, only altering directions.
To find the eigenvectors, solve:
$$(R(\theta) - \lambda I) \vec{v} = 0$$
For $ \lambda_1 = e^{i\theta} $ and $ \lambda_2 = e^{-i\theta} $, the corresponding eigenvectors are complex and take the form:
$$ \vec{v}_1 = \begin{bmatrix} 1 \\ i \end{bmatrix}, \quad \vec{v}_2 = \begin{bmatrix} 1 \\ -i \end{bmatrix}. $$
Since these eigenvectors are not real, it indicates that a pure rotation in 2D has no real eigenvectors, meaning no real direction remains invariant.
2. Generalization to 3D Rotation
In three dimensions, rotation matrices depend on an axis of rotation. A typical rotation around the $ z $-axis is:
$$ R_z(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{bmatrix}. $$
Eigenvalues:
$$ \lambda_1 = e^{i\theta}, \quad \lambda_2 = e^{-i\theta}, \quad \lambda_3 = 1. $$
Eigenvectors:
The eigenvector corresponding to $ \lambda_3 = 1 $ is $ \vec{v}_3 = (0, 0, 1)^T $, indicating that the rotation leaves the $ z $-axis invariant.
The other two eigenvectors correspond to complex plane rotations in the $ xy $-plane.
5. System of Linear Equations¶
Depending on the number of equations relative to the number of unknowns, we classify such systems into three types:
- well-determined linear systems
- under-determined linear systems
- over-determined linear systems
from IPython.display import YouTubeVideo
YouTubeVideo('qHOwNxQIfpo', width = "560", height = "315")
5.1. Well-Determined Linear Systems¶
A system where the number of equations equals the number of unknowns ($m=n$).
- System of linear equations
$$\begin{array}{c}\ 2x_1 + 3x_2 & = 7\\ x_1 + 4x_2 & = 6 \end{array} \quad \implies \quad \begin{array}{l}\begin{align*} x_1^{*} & = 2\\ x_2^{*} & = 1 \end{align*}\end{array}$$
- Geometric point of view
- Matrix form
$$\begin{array}{c}\ a_{11}x_1 + a_{12}x_2 = b_1\\ a_{21}x_1 + a_{22}x_2 = b_2 \end{array} \begin{array}{c}\ \quad \text{Matrix form}\\ \implies \end{array} \quad \begin{array}{l} \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = \begin{bmatrix} b_{1}\\ b_{2} \end{bmatrix} \end{array}$$
\begin{array}{l} \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \large AX = B \end{array}
\begin{array}{l} \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \therefore & X^{*} = A^{-1} B \quad\text{ if } A^{-1} \text{ exists} \end{array}
- Typically has a unique solution if the coefficient matrix is invertible (i.e., has full rank).
5.2. Under-Determined Linear Systems¶
A system where the number of equations is less than the number of unknowns ($m < n$).
- System of linear equations
$$2x_1 + 3x_2 = 7 \quad \Longrightarrow \quad \text{Many solutions}$$
- Geometric point of view
- Matrix form
$$\begin{array}{c}\ a_{11}x_1 + a_{12}x_2 = b_1 \end{array} \begin{array}{c}\ \quad \text{Matrix form}\\ \implies \end{array} \quad \begin{array}{l} \begin{bmatrix} a_{11} & a_{12} \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = b_{1}\\ \end{array}$$
\begin{array}{l} \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \large AX = B \end{array}
\begin{array}{l} \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \therefore \; \text{ Many Solutions when $A$ is} \color{red}{\text{ fat}} \end{array}
- Usually has infinitely many solutions
5.3. Over-Determined Linear Systems¶
A system where the number of equations is greater than the number of unknowns ($ m > n$).
- System of linear equations
$$\begin{array}{c}\ 2x_1 + 3x_2 & = 7\\ x_1 + 4x_2 & = 6\\ x_1 + x_2 & = 4 \end{array} \; \implies \; \begin{array}{l}\begin{align*} \text{No solutions} \end{align*}\end{array}$$
- Geometric point of view
- Matrix form
$$\begin{array}{c}\ a_{11}x_1 + a_{12}x_2 = b_1\\ a_{21}x_1 + a_{22}x_2 = b_2\\ a_{31}x_1 + a_{32}x_2 = b_3\\ \end{array} \begin{array}{c}\ \quad \text{Matrix form}\\ \implies \end{array} \quad \begin{array}{l} \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\\ a_{31} & a_{32}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = \begin{bmatrix} b_{1}\\ b_{2}\\ b_{3} \end{bmatrix} \end{array}$$
\begin{array}{l} \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \large AX = B \end{array}
\begin{array}{l} \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \quad & \therefore \; \text{ No Solutions when $A$ is } \color{red}{\text{skinny}} \end{array}
- Generally, such systems do not have an exact solution
5.4. Summary of Linear Systems¶
$$\large AX = B$$
- Square: Well-determined Linear Systems
$$ \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = \begin{bmatrix} b_{1}\\ b_{2}\\ \end{bmatrix} $$
- Fat: Under-determined Linear Systems
$$ \begin{bmatrix} a_{11} & a_{12}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}=b_1 $$
- Skinny: Over-determined Linear Systems
$$ \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\\ a_{31} & a_{32}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = \begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \end{bmatrix} $$
6. Solution of a System of Linear Equations¶
6.1. Least-Norm Solution¶
- For under-determined linear system
$$ \begin{bmatrix} a_{11}&a_{12}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = b_1 \quad\text{ or }\quad AX = B$$
- There are many solutions
Find the solution of $AX = B$ that minimizes $\lVert X \rVert$ or $\lVert X \rVert^2$
Optimization problem
$$\begin{align*} \min \; & \; \lVert X \rVert ^2\\ \text{s. t.} \; & AX = B \end{align*}$$
- Geometric point of view
- The least-norm solution finds the smallest possible solution vector $X^*$ that satisfies the equation $A X = B$.
- Among the infinitely many possible solutions, it chooses the one closest to the origin in Euclidean space
- Used when minimal-energy solution is desired.
- Select one solution among many solutions (optional)
$$X^{*} = A^T \left( AA^T \right)^{-1}B \quad \text{Least norm solution}$$
- Often control problem
6.2. Least-Square Solution¶
- For over-determined linear system
$$ \begin{align*} \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\\ a_{31} & a_{32} \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} &\neq \begin{bmatrix} b_{1}\\ b_{2}\\ b_{3} \end{bmatrix} \quad\text{ or }\quad AX \neq B \\ \\ x_1 \begin{bmatrix} a_{11} \\ a_{21} \\ a_{31} \end{bmatrix} + x_2 \begin{bmatrix} a_{12} \\ a_{22} \\ a_{32} \end{bmatrix} &\neq \begin{bmatrix} b_{1}\\ b_{2}\\ b_{3} \end{bmatrix} \end{align*} $$
- If no exact solution exists, an approximate solution can be found to minimize the error $\lVert E \rVert = \lVert B-AX \rVert$:
Find $X$ that minimizes $\lVert E \rVert$ or $\lVert E \rVert^2$
Optimization problem
$$ \begin{align*} \min\limits_{X}{\lVert E\rVert}^2 & = \min\limits_{X}{\lVert AX - B\rVert}^2\\ X^{*} & = \left( A^TA \right)^{-1}A^TB\\ B^{*} = AX^{*} & = A\left( A^TA \right)^{-1}A^TB \end{align*} $$
- Geometric point of view
- The least-squares solution $X^*$ ensures that the error vector
$$E = B - AX^*$$ - is orthogonal to the column space of $A$. In other words, $A X^*$ is the closest point to $\vec{B}$ within the subspace spanned by the columns of $A$.
- The least-squares solution $X^*$ ensures that the error vector
- Often estimation problem
6.3. Geometric Point of View: Projection¶
From a geometric perspective, the least-squares solution of a system of linear equations is closely related to projection onto a subspace.
To develop a foundational understanding of projection, it is useful to first consider the simplest case: the projection of a vector $X$ onto another vector $ Y$. In this case, the goal is to determine the component of $X$ that lies in the direction of $Y$. This provides insight into the more general concept of projection onto a subspace, which is fundamental in linear algebra and its applications.
6.3.1. Vector Projection¶
- The vector projection of a vector $X$ on (or onto) a nonzero vector $Y$ is the orthogonal projection of $X$ onto a straight line parallel to $Y$
$$ \begin{align*} W & = \omega\hat{Y}= \omega \frac{Y}{\lVert Y \rVert}, \;\text{where } \omega = \lVert W \rVert\\\\ \omega & = \lVert X \rVert \cos \theta = \lVert X \rVert \frac{X \cdot Y}{\lVert X \rVert \lVert Y \rVert} = \frac{X \cdot Y}{\lVert Y \rVert}\\\\ W & = \omega \hat{Y} = \frac{X \cdot Y}{\lVert Y \rVert}\frac{Y}{\lVert Y \rVert} = \frac{X \cdot Y}{\lVert Y \rVert \lVert Y \rVert}Y = \frac{X^T Y}{Y^T Y}Y = \frac{\langle X, Y \rangle}{\langle Y, Y \rangle}Y\\\\ & = Y\frac{X^T Y}{Y^T Y} = Y\frac{Y^T X}{Y^T Y} = \frac{YY^T}{Y^T Y}X = PX \end{align*} $$
- Note: Given $Y$, $X$ is the input and $W$ is the output.
- A more preferable approach to computing $\omega$ and $W$
$$ \begin{align*} Y & \perp \left( X - W \right)\\\\ \implies \;& Y^T \left( X - W \right) = Y^T \left( X - \omega \frac{Y}{\lVert Y \rVert} \right) = 0\\\\ \implies \;& \omega = \frac{Y^T X}{Y^T Y}\lVert Y \rVert\\\\ & W = \omega \frac{Y}{\lVert Y \rVert} = \frac{Y^TX}{Y^TY}Y = \frac{\langle X, Y \rangle}{\langle Y, Y \rangle}Y \end{align*} $$
import numpy as np
X = np.matrix([[1],[1]])
Y = np.matrix([[2],[0]])
print(X)
print(Y)
print(Y.T*Y)
omega = (X.T*Y)/(Y.T*Y)
print(float(omega))
omega = float(omega)
W = omega*Y
print(W)
6.3.2. Orthogonal Projection onto a Subspace¶
Building on the insight from the example above, consider the more general concept of projection onto a subspace.
Projection of $B$ onto a subspace $U$ of span of $\vec A_1$ and $\vec A_2$
Orthogonality
$$ \begin{align*} \vec A_1 &\perp \left( AX^{*}-B\right)\\ \vec A_2 &\perp \left( AX^{*}-B\right)\\\\ A = \begin{bmatrix} \vec A_1 & \vec A_2\end{bmatrix} &\perp \left( AX^{*}-B\right) \end{align*} $$
$$ \begin{align*} A^T \left(AX^{*} - B \right) & = 0\\ A^TAX^{*} & = A^TB\\ X^{*} & = \left( A^TA \right)^{-1}A^TB\\ B^{*} = AX^{*} & = A\left( A^TA \right)^{-1}A^TB \end{align*} $$
import numpy as np
A = np.matrix([[1,0],[0,1],[0,0]])
B = np.matrix([[1],[1],[1]])
X = (A.T*A).I*A.T*B
print(X)
Bstar = A*X
print(Bstar)
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')