Linear Algebra

Table of Contents

1. Linear Equations¶

from IPython.display import YouTubeVideo
YouTubeVideo('ub4oe2im8T8', width = "560", height = "315")

1.1. Solving Linear Equations¶

Consider a system of two linear equations with two unknowns:

$$ \begin{align*} 4x_{1} - 5x_{2} &= -13\\ -2x_{1} + 3x_{2} &= 9 \end{align*} $$

This system can be written compactly in vector-matrix form as $Ax = b$, where:

$$A = \begin{bmatrix} 4 & -5 \\ -2 & 3 \end{bmatrix} , \quad x = \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} , \quad b = \begin{bmatrix} -13 \\ 9 \end{bmatrix} $$

To solve for $x$, we apply the inverse of matrix $A$ (assuming it exists):

$$ \begin{align*} Ax &= b \\ A^{-1}Ax &= A^{-1}b \\ x &= A^{-1}b \end{align*} $$

We won't compute $A^{-1}$ by hand. Instead, we will use numpy in Python to compute the solution efficiently and accurately.

import numpy as np

A = np.array([[4, -5], [-2, 3]])
print(A)

[[ 4 -5]
 [-2  3]]

b = np.array([[-13], [9]])
print(b)

[[-13]
 [  9]]

$A^{-1} b$

x = np.linalg.inv(A).dot(b)
print(x)

[[3.]
 [5.]]

If you are not familiar with linalg, covert the data into a matrix form using asmatrix

A = np.asmatrix(A)
b = np.asmatrix(b)

x = A.I*b
print(x)

[[3.]
 [5.]]

1.2. System of Linear Equations¶

Consider a general system of $m$ linear equations in $n$ unknowns:

$$ \begin{align*} y_1 &= a_{11}x_{1} + a_{12}x_{2} + \cdots + a_{1n}x_{n} \\ y_2 &= a_{21}x_{1} + a_{22}x_{2} + \cdots + a_{2n}x_{n} \\ &\, \vdots \\ y_m &= a_{m1}x_{1} + a_{m2}x_{2} + \cdots + a_{mn}x_{n} \end{align*} $$

This system can be concisely expressed in matrix-vector form as:

$$y = Ax$$

Where

$$ y= \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{m} \end{bmatrix} \qquad A = \begin{bmatrix} a_{11}&a_{12}&\cdots&a_{1n} \\ a_{21}&a_{22}&\cdots&a_{2n} \\ \vdots&\vdots&\ddots&\vdots\\ a_{m1}&a_{m2}&\cdots&a_{mn} \\ \end{bmatrix} \qquad x= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix} $$

This compact representation is fundamental in linear algebra, enabling efficient computation and deeper theoretical analysis using matrix operations.

1.3. Elements of a Matrix¶

A matrix is more than just an array of numbers - it is a structured representation of a linear transformation. There are two ways to view a matrix: as a collection of column vectors or row vectors.

(1) Matrix as a collection of columns

A matrix $A$ can be written in terms of its column vectors:

$$A = \begin{bmatrix} \mid&\mid&&\mid\\ a_{1} & a_{2} & \cdots & a_{n}\\ \mid&\mid&&\mid\\ \end{bmatrix} $$

$a_i \in \mathbb{R}^m$ denotes the $i$-th column vector, not a scalar entry.

(2) Matrix as a collection of rows

Alternatively, the same matrix can be expressed in terms of its rows:

$$A = \begin{bmatrix} - & b_{1}^T& - \\ - & b_{2}^T& - \\ &\vdots& \\ - & b_{m}^T& - \end{bmatrix} $$

Each $b_i \in \mathbb{R}^n$ is a row vector, and $b_i^T$ denotes its transpose.

Understanding both views is essential in linear algebra, especially when dealing with matrix operations such as multiplication, projection, rank, and transformations in both column space and row space.

1.4. Vector-Vector Products¶

Definition of inner product: $x, y \in \mathbb{R}^{n}$

$$\langle x,y \rangle = x^{T}y = \sum\limits_{i=1}^{n}x_{i}\,y_{i}=y^Tx=\langle y,x \rangle \quad \in \mathbb{R} $$

The inner product operator takes two column vectors as input and produces a scalar value as output.
The summation $\sum\limits_{i=1}^{n}x_{i}\,y_{i}$ calculates the product of corresponding components of $ x $ and $ y $ and sums them up.
The inner product can be thought of as a measure of similarity between two column vectors. We will discuss this in more detail later.

In function spaces, the inner product extends to integrals:

$$\langle f, g \rangle = \int f(x) g(x) \,dx$$

x = np.array([[1], [1]])
y = np.array([[2], [3]])

print(x.T.dot(y))

[[5]]

x = np.asmatrix(x)
y = np.asmatrix(y)

print(x.T*y)

[[5]]

1.5. Matrix-Vector Products¶

Given $A \in \mathbb{R}^{m \times n}$, we multiply $A$ and $x$:

$$Ax \in \mathbb{R}^m$$

which means the output is an $m$-dimensional vector.

Intuition

(1) Writing $A$ by rows, each entry of $Ax$ is an inner product between $x$ and a row of $A$

$$A = \begin{bmatrix} - &b_{1}^{T} & - \\ -& b_{2}^{T}&- \\ &\vdots& \\ -& b_{m}^{T}&- \end{bmatrix} ,\qquad Ax \in \mathbb{R}^{m} = \begin{bmatrix} b_{1}^{T}x \\ b_{2}^{T}x \\ \vdots \\ b_{m}^{T}x \end{bmatrix} $$

Note: $b_i^T x$ represents the similarity between $b_i$ and $x$.

(2) Writing $A$ by columns, $Ax$ is a linear combination of the columns of $A$, with coefficients given by $x$

$$A = \begin{bmatrix} \mid&\mid&&\mid\\ a_{1} & a_{2} & \cdots & a_{n}\\ \mid&\mid&&\mid\\ \end{bmatrix} ,\qquad Ax \in \mathbb{R}^{m} = \sum\limits_{i=1}^{n}a_{i}x_{i}$$

Note: $Ax$ is a linear combination of the columns of $A$, meaning it lies in the column space (or span of the columns) of $A$.

2. Norms (Strenth or Distance in Linear Space)¶

Question: We know how to compute the distance between two real numbers. Can we extend this concept to define the distance between two vectors?

Norm provides a general concept of distance in linear algebra. A norm is a function that assigns a non-negative length (or size) to a vector, helping us define "distance" in different ways.

A vector norm is any function $f : \mathbb{R}^{n} \rightarrow \mathbb{R}$ with

(1) Non-negativity: $f(x) \geq 0 \;$ and $\;f(x) = 0 \quad \Longleftrightarrow \quad x = 0$

(2) Scalar multiplication: $f(ax) = \lvert a \rvert f(x) \;$ for $\; a \in \mathbb{R}$

(3) Triangle inequality: $f(x + y) \leq f(x) + f(y)$

$l_{2}$ norm

$$\left\lVert x \right\rVert _{2} = \sqrt{\sum\limits_{i=1}^{n}x_{i}^2} = \|x\|_2 = \left( \sum_{i=1}^{n} x_i^2 \right)^{\frac{1}{2}}$$

$l_{1}$ norm

$$\left\lVert x \right\rVert _{1} = \sum\limits_{i=1}^{n} \left\lvert x_{i} \right\rvert$$

$l_p$ norm

$$\|x\|_p = \left( \sum_{i=1}^{n} |x_i|^p \right)^{\frac{1}{p}}$$

$l_{\infty}$ norm

$$\|x\|_{\infty} = \max_{i} |x_i|$$

$\lVert x\rVert = \lVert x - 0\rVert$ measures length of vector (from origin)

$\lVert x - y\rVert$ gives the distance between two vector $x$ and $y$

Question: Can we define the distance between two matrices?

x = np.array([[4], [3]])

np.linalg.norm(x, 2)

5.0

np.linalg.norm(x, 1)

7.0

2.1. Orthogonality¶

Two vectors $x, y \in \mathbb{R}^n$ are orthogonal if

$$x^Ty = 0$$

They are orthonormal if, in addition,

$$\lVert x \rVert _{2} = \lVert y \rVert _{2} = 1 $$

The concept of orthogonality is naturally defined in Euclidean space. However, can this idea be extended to functional spaces? In other words, can you conceptualize a notion of orthogonality between two functions?

$$\langle f, g \rangle = \int f(x) g(x) \,dx = 0$$

x = np.matrix([[1],[2]])
y = np.matrix([[2],[-1]])

x.T*y

matrix([[0]])

2.2. Angle between Vectors¶

Cauchy-Schwarz Inequality for any $x, y \in \mathbb{R}^n$:

$$\quad \lvert x^Ty \rvert \leq \lVert x \rVert \, \lVert y \rVert$$

Then, angle between vectors in $\mathbb{R}^n$ is defined as

$$ \begin{align*} \theta &= \angle(x,y) = \cos^{-1}\frac{x^Ty}{\lVert x \rVert \lVert y \rVert}\\ \\\\ \text{thus}\qquad x^Ty &= \lVert x \rVert \lVert y\rVert \cos\theta \end{align*} $$

Note on Angles, Distance, and Similarity:

The concept of an angle is fundamentally rooted in the notion of distance and
Angle is closely connected to the concept of similarity.
In the context of functional spaces, it is important to explore how the idea of an angle can be formally defined between two functions.

Halfspaces and Classification:

The set

$$\{ x \mid x^Ty \leq 0\} $$

defines a halfspace in $\mathbb{R}^n$

The boundary is the hyperplane $x^Ty=0$, which passes through the origin.
The vector $y$ acts as the outward normal to this hyperplane - it's perpendicular to the boundary and determines which side is "positive" or "negative."

Given a vector $y$, a linear space can be divided into two halves. This property plays a fundamental role in classification, where it is used to separate data points into distinct categories. We will explore this concept in more detail later.

3. Matrix and Linear Transformation¶

from IPython.display import YouTubeVideo
YouTubeVideo('-JFdkgJrkyc', width = "560", height = "315")

Matrix and Transformation

A matrix is not just an array of numbers but a powerful tool for encoding and understanding linear transformations. By studying how matrices act on vectors, we gain a deeper insight into geometry, algebra, and real-world applications across various fields.

$$ M= \begin{bmatrix} m_{11} & m_{12} & m_{13}\\ m_{21} & m_{22} & m_{23}\\ m_{31} & m_{32} & m_{33}\\ \end{bmatrix} $$

A matrix is a mathematical object that can be used to represent a linear transformation.

$$\begin{array}{c}\ \vec x\\ \text{input} \end{array} \begin{array}{c}\ \quad \text{linear transformation}\\ \implies \end{array} \quad \begin{array}{l} \vec y\\ \text{output} \end{array}$$

In the equation

$$\vec y = M \vec x$$

$\vec x$ represents the input vector in the transformation
$M$ is the transformation matrix that defines how $\vec x$ is mapped to another space
$\vec y$ is the output vector, obtained by applying the linear transformation represented by $M$ to $\vec x$

Interpreting Matrices and Transformations:

If you're given a linear transformation, you can represent it with a matrix.
If you're given a matrix, it tells you how vectors get transformed (linearly).

3.1. Linear Transformation¶

A linear transformation is a function $T: X \rightarrow Y$ between two vector spaces $X$ and $Y$ over the same field that satisfies two properties: superposition and homogeneity.

Superposition (additivity)

$$T(\vec x_1 + \vec x_2) = T(\vec x_1) + T(\vec x_2)$$

Homogeneity

$$T(k \vec x) = kT(\vec x)$$

Common examples of linear transformations include:

Rotation
Reflection
Scaling
Projection
...

Linear vs. Non-linear functions

$$\begin{array}{l}\\ \text{linear}& & &\text{non-linear}\\ &\\ f(x) = 0 & & & f(x) = x + c\\ f(x) = kx & & & f(x) = x^2\\ f(x(t)) = \frac{dx(t)}{dt} & & & f(x) = \sin x\\ f(x(t)) = \int_{a}^{b} x(t)dt & & \end{array}$$

The function $f(x)=x+c$ is not strictly linear, as it does not satisfy the condition $f(0)=0$. Instead, it is classified as an affine transformation, which is a more general form of linear mapping that includes a translation component.

3.2. Rotation¶

Consider a rotation transformation by an angle $\theta$ in 2D space: $R(\theta)$

$$ \vec y = R(\theta) \vec x$$

Let's find matrix $R(\theta)$

If $\vec x = \begin{bmatrix} 1 \\ 0 \end{bmatrix} $,

\begin{equation*} \begin{bmatrix} \cos(\theta)\\ \sin(\theta) \end{bmatrix}= R(\theta) \begin{bmatrix} 1\\ 0 \end{bmatrix} \end{equation*}

If $\vec x = \begin{bmatrix} 0 \\ 1 \end{bmatrix} $,

\begin{equation*} \begin{bmatrix} -\sin(\theta)\\ \cos(\theta) \end{bmatrix}= R(\theta) \begin{bmatrix} 0\\ 1 \end{bmatrix} \end{equation*}

Consider a linear transformation represented by a matrix $M$. When applied to a set of input vectors $\vec x_1$ and $\vec x_2$, the transformation produces corresponding output vectors $\vec y_1$ and $\vec y_2$, respectively:

$$M\vec{x}_1 = \vec{y}_1, \qquad M\vec{x}_2 = \vec{y}_2$$

This relationship can be compactly expressed in matrix form as:

$$M\begin{bmatrix} \vec{x}_1 & \vec{x}_2 \end{bmatrix} = \begin{bmatrix} \vec{y}_1 & \vec{y}_2 \end{bmatrix}$$

This formulation emphasizes that a linear transformation applied to multiple vectors is equivalent to applying the matrix to a matrix whose columns consist of those vectors. The result is a matrix whose columns are the corresponding transformed vectors.

$$\begin{bmatrix} \cos(\theta)& -\sin(\theta)\\ \sin(\theta)& \cos(\theta) \end{bmatrix}= R(\theta) \begin{bmatrix} 1& 0\\ 0& 1 \end{bmatrix} = R(\theta)I = R(\theta)$$

Notes:

The method used to derive $R(\theta)$ here differs somewhat from what we have learned. However, it is a more direct and conceptually intuitive means of finding $R(\theta)$ based on its definition of rotation.
As shown, rotation is a linear transformation because it can be represented by a matrix $M$. However, at first glance, rotation may not intuitively appear to be linear. Nonetheless, since it satisfies the properties of additivity and homogeneity, rotation is indeed a linear transformation.

import numpy as np

theta = 90/180*np.pi
R = np.matrix([[np.cos(theta), -np.sin(theta)],
               [np.sin(theta), np.cos(theta)]])
x = np.matrix([[1],[0]])

y = R*x
print(y)

[[  6.12323400e-17]
 [  1.00000000e+00]]

3.3. Stretch/Compress¶

A fundamental class of linear transformations involves scaling a vector - either stretching or compressing it - without altering its direction. This transformation is defined by multiplying the vector by a scalar $k$:

$$\vec y = k \vec x$$

This operation simply scales the magnitude (length) of the vector $\vec{x}$ while maintaining its orientation (unless $k < 0$, which also reverses the direction).

To express this scaling transformation in matrix form, we can write:

$$\vec y = k I \vec x $$

where $I$ is the identity matrix, and the scalar multiplication is applied uniformly across all dimensions. For example, in two dimensions:

$$\vec y = \begin{bmatrix}k&0\\0&k\end{bmatrix}\vec x$$

Example

Let $T$ be a linear transformation that:

Stretches by a factor of $a$ along $\hat x$-direction
Stretches by a factor of $b$ along $\hat y$-direction

We are to compute the corresponding matrix $A$ such that:

$$\vec y = A \vec x$$

Solution:

$$\begin{array}\\ \begin{bmatrix}ax_1\\ bx_2\end{bmatrix}& = A\begin{bmatrix}x_1\\ x_2\end{bmatrix} \Longrightarrow A = \,?\\\\ & = \begin{bmatrix}a & 0\\ 0 & b\end{bmatrix}\begin{bmatrix}x_1\\ x_2\end{bmatrix} \end{array}$$
$$\begin{array}\\ A\begin{bmatrix}1\\0\end{bmatrix} & = \begin{bmatrix}a\\0\end{bmatrix} \\ A\begin{bmatrix}0\\1\end{bmatrix} & = \begin{bmatrix}0\\b\end{bmatrix} \\\\ A\begin{bmatrix}1 & 0\\ 0 &1\end{bmatrix} & = A = \begin{bmatrix}a & 0\\0 & b\end{bmatrix} \\ \end{array}$$

More importantly, by looking at $A = \begin{bmatrix}a & 0\\0 & b\end{bmatrix}$, can you determine what kind of corresponding linear transformation $T$ represents?

3.4. Projection¶

$P$: Projection onto $\hat x$ - axis

$$\begin{array}{c}\\ & P & \\ \begin{bmatrix}x_1\\x_2\end{bmatrix} & \implies & \begin{bmatrix}x_1\\ 0\end{bmatrix}\\ \vec x & & \vec y \end{array}$$

$$\vec y = P\vec x = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} x_1 \\ 0 \end{bmatrix}$$

$$ \begin{array}\\ P \begin{bmatrix} 1 \\ 0 \end{bmatrix} & = \begin{bmatrix} 1 \\ 0 \end{bmatrix}\\ P \begin{bmatrix} 0 \\ 1 \end{bmatrix} & = \begin{bmatrix} 0 \\ 0 \end{bmatrix}\\\\ P \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} & = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} \end{array} $$

Again, at first glance, projection may not intuitively appear to be linear. Nonetheless, since it satisfies the properties of additivity and homogeneity, projection is indeed a linear transformation.

Question: Provide a explanation, based on the definition of projection, for why the matrix $P$ is not invertible.

import numpy as np

P = np.matrix([[1, 0],
               [0, 0]])
x = np.matrix([[1],[1]])

y = P*x
print(y)

[[1]
 [0]]

3.5. Linear Transformation and Basis Representation¶

If $\vec {v}_1$ and $\vec {v}_2$ are basis, and we know

$$T(\vec {v}_1) = \vec {\omega}_1, \quad T(\vec {v}_2) = \vec {\omega}_2$$

Then, for any $\vec x$, we can uniquely express it as a linear combination of the basis vectors:

$$\vec x = a_1\vec v_1 + a_2\vec v_2$$

Since $T$ is linear, it follows that:

$$\begin{align*} T(\vec x) & = T(a_1\vec v_1 + a_2\vec v_2) \\ & = a_1T(\vec v_1) + a_2T(\vec v_2)\\ & = a_1\vec {\omega}_1 + a_2\vec {\omega}_2\\ \end{align*}$$

Explanation and Computational Significance

Suppose that computing $T(\vec{x})$ for a new input $\vec{x}$ directly takes approximately 2 hours. However, if $\vec{\omega}_1 $ and $ \vec{\omega}_2 $ have already been precomputed offline, the output for a new input $\vec x $ can be obtained immediately.

Rather than computing $T(\vec{x})$ from scratch (a process that takes 2 hours), the transformation can be rapidly evaluated by simply taking the linear combination

$$T(\vec x) = a_1 \vec{\omega}_1 + a_2 \vec{\omega}_2$$

where the coefficients $a_1$ and $a_2$ are quickly and uniquely determined from the representation of $\vec{x}$ in the chosen basis.

This approach significantly reduces the computational overhead by leveraging the precomputed outputs.

This is why a linear system greatly simplifies problem-solving.
The key insight is that we only need to observe how the basis vectors are transformed linearly.

4. Eigenvalue and Eigenvector¶

Eigen-analysis, which involves the study of eigenvalues and eigenvectors, plays a fundamental role in numerous areas of science and engineering. Its frequent appearance across various domains is not incidental but rather a consequence of its deep mathematical and physical significance. Eigen-analysis provides insights into the inherent characteristics of linear transformations.

Fundamental Equation

$$A \vec v = \lambda \vec v$$

Where

$ A $ is a square matrix representing a linear transformation,
$ \vec{v} \neq \vec{0} $ is an eigenvector,
$ \lambda $ is the associated eigenvalue, and
$ A\vec{v} $ is the image of $ \vec{v} $ under the transformation, typically involving a combination of rotation and scaling.

Intuition and Interpretation

An eigenvector is a special vector that does not change direction under the transformation defined by $ A $. Instead, it is only scaled by a factor $ \lambda $:

$$A \vec v \; \parallel \; \vec v$$

In this sense:

$ \lambda\vec{v} $ is a stretched or compressed version of $ \vec{v} $, aligned in the same direction.
The eigenvalue $ \lambda $ determines the magnitude and direction of this scaling.

Deeper Insight: Invariance and Change

At a high level, to understand the dynamics of $A$ (or the transformation represented by $A$), it is crucial to identify and analyze its invariants. In other words, to comprehend how things change, we must first understand what remains unchanged. Eigenvectors serve this purpose by revealing the directions that remain unaltered under transformation, while eigenvalues indicate the scaling effect along these directions. This fundamental principle underlies many applications in physics, engineering, and data science, where understanding invariant structures provides deeper insights into system behavior and stability.

Or more interpretively:

"While things compete and evolve, their fundamental nature remains unchanged."

This idiom conveys the idea that, despite external competition and transformation, there are underlying principles or essences that remain constant. It aligns well with the notion that to understand change, one must first recognize what is invariant.

4.1. Basis and Eigenvector Representation in Linear Transformation¶

If $\vec {v}_1$ and $\vec {v}_2$ are basis and eigenvectors, and we know

$$ \begin{align*} T(\vec {v}_1) &= \vec {\omega}_1 = \lambda_1 \vec{v}_1 \\ T(\vec {v}_2) &= \vec {\omega}_2 = \lambda_2 \vec{v}_2 \end{align*} $$

Then, for any $\vec x$

$$\begin{array}{l} \vec x & = a_1\vec v_1 + a_2\vec v_2 & (a_1 \;\text{and } a_2 \;\text{unique})\\ \\ T(\vec x) & = T(a_1\vec v_1 + a_2\vec v_2) \\ & = a_1T(\vec v_1) + a_2T(\vec v_2)\\ & = a_1 \lambda_1\vec {v}_1 + a_2 \lambda_2 \vec {v}_2\\ & = \lambda_1 a_1 \vec {v}_1 + \lambda_2 a_2 \vec {v}_2\\ \end{array}$$

Only thing that we need is to observe how each basis is independently scaled

4.2. How to Compute Eigenvalue & Eigenvector¶

To find the eigenvalues and eigenvectors of a matrix $A$, we begin with the defining equation of eigen-analysis:

$$A \vec{v} = \lambda \vec v$$

This can be rewritten as:

$$A\vec v - \lambda I \vec v = (A - \lambda I)\vec v = 0$$

This is a homogeneous system of equations. For nontrivial solutions $\vec v \neq \vec 0$ to exist, the matrix $(A-\lambda I)$ must be singular - i.e., it must not have an inverse. Hence, the condition for $\lambda$ to be an eigenvalue is:

$$\text{det}(A - \lambda I) = 0$$

This equation is known as the characteristic equation of the matrix $A$.

Alternative (Geometric) Perspective

Beyond the algebraic procedure, eigenvectors and eigenvalues can also be understood directly from the definition of eigen-analysis.

Given a matrix $A$, consider the corresponding linear transformation it represents. From this linear transformation, we can identify eigenvectors, which are direction-invariant vectors - meaning they remain in the same direction after the transformation, though they may be scaled by an eigenvalue.

Through the following examples, we will see how eigenvectors remain direction-invariant under a given linear transformation and how eigenvalues determine their scaling behavior.

4.2.1. Example: Projection¶

$A = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}$ : projection onto $\hat x$- axis. Find eigenvalues and eigenvectors.

If $\vec x = \begin{bmatrix} 0 \\ 1 \end{bmatrix} $,

\begin{equation*} \vec y = \begin{bmatrix} 0 \\ 0 \end{bmatrix} = A \vec x = 0 \cdot \vec x\\ \therefore \; \lambda_1 = 0 \space \;\text{and}\; \space \vec {v}_1 = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \end{equation*}

If $\vec x = \begin{bmatrix} 1 \\ 0 \end{bmatrix} $,

\begin{equation*} \vec y = \begin{bmatrix} 1 \\ 0 \end{bmatrix} = A \vec x = 1 \cdot \vec x\\ \therefore \; \lambda_2 = 1 \space \;\text{and}\; \space \vec {v}_2 = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \end{equation*}

import numpy as np

A = np.array([[1, 0],
              [0, 0]])
D, V = np.linalg.eig(A)

print('D =', D)

print('\nV =')
print(V)

D = [1. 0.]

V =
[[1. 0.]
 [0. 1.]]

4.2.2. Example: Projection¶

Projection onto the plane. Find eigenvalues and eigenvectors.

For any $\vec x$ in the plane, $\; P\vec x = \vec x \; \implies \; \lambda = 1$

For any $\vec x$ perpendicular to the plane, $\; P\vec x = \vec 0 \; \implies \; \lambda = 0$

4.2.3. Example: Rotation¶

Eigen-analysis of a rotation transformation provides insight into how vectors behave under rotation in a given vector space. This analysis involves finding eigenvalues and eigenvectors of a rotation matrix, which describe the invariant directions and scaling factors of the transformation.

Rotation Matrix in 2D

A rotation matrix in two-dimensional space is given by:

$$ R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} $$

where $\theta$ is the angle of rotation.

The eigenvalues of $ R(\theta) $ are found by solving the characteristic equation:

$$ \begin{align*} \det(R(\theta) - \lambda I) &= 0 \\\\ \begin{vmatrix} \cos\theta - \lambda & -\sin\theta \\ \sin\theta & \cos\theta - \lambda \end{vmatrix} &= 0 \\\\ (\cos\theta - \lambda)^2 + \sin^2\theta &= 0\\\\ \lambda^2 - 2\lambda \cos\theta + 1 &= 0 \\\\ \lambda &= \cos\theta \pm i \sin\theta \end{align*} $$

This gives the two complex eigenvalues:

$$\lambda_1 = e^{i\theta}, \quad \lambda_2 = e^{-i\theta}$$

Since these eigenvalues have unit magnitude ($ |\lambda| = 1 $), they confirm that rotation preserves vector magnitudes, only altering directions.

To find the eigenvectors, solve:

$$(R(\theta) - \lambda I) \vec{v} = 0$$

For $ \lambda_1 = e^{i\theta} $ and $ \lambda_2 = e^{-i\theta} $, the corresponding eigenvectors are complex and take the form:

$$ \vec{v}_1 = \begin{bmatrix} 1 \\ i \end{bmatrix}, \quad \vec{v}_2 = \begin{bmatrix} 1 \\ -i \end{bmatrix}. $$

Since these eigenvectors are not real, it indicates that a pure rotation in 2D has no real eigenvectors, meaning no real direction remains invariant.

Generalization to 3D Rotation

In three dimensions, rotation matrices depend on an axis of rotation. A typical rotation around the $ z $-axis is:

$$ R_z(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{bmatrix}. $$

Eigenvalues:

$$ \lambda_1 = e^{i\theta}, \quad \lambda_2 = e^{-i\theta}, \quad \lambda_3 = 1. $$

Eigenvectors:

The eigenvector corresponding to $ \lambda_3 = 1 $ is $ \vec{v}_3 = (0, 0, 1)^T $, indicating that the rotation leaves the $ z $-axis invariant.
The other two eigenvectors correspond to complex plane rotations in the $ xy $-plane.

5. System of Linear Equations¶

Depending on the number of equations relative to the number of unknowns, we classify such systems into three types:

well-determined linear systems
under-determined linear systems
over-determined linear systems

from IPython.display import YouTubeVideo
YouTubeVideo('qHOwNxQIfpo', width = "560", height = "315")

5.1. Well-Determined Linear Systems¶

A well-determined linear system is one in which the number of equations is equal to the number of unknowns ($m=n$).

System of Linear Equations

Consider the following system with two equations and two unknowns:

$$\begin{array}{c}\ 2x_1 + 3x_2 & = 7\\ x_1 + 4x_2 & = 6 \end{array} \quad \implies \quad \begin{array}{l}\begin{align*} x_1^{*} & = 2\\ x_2^{*} & = 1 \end{align*}\end{array}$$

This solution satisfies both equations simultaneously.

Geometric Interpretation

From a geometric perspective, each equation represents a line in the two-dimensional plane $ \mathbb{R}^2 $. The solution $ (x_1, x_2)^* = (2, 1) $ corresponds to the point of intersection of these two lines.

Matrix Form

$$\begin{array}{c}\ a_{11}x_1 + a_{12}x_2 = b_1\\ a_{21}x_1 + a_{22}x_2 = b_2 \end{array} \qquad \implies \qquad \begin{array}{l} \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = \begin{bmatrix} b_{1}\\ b_{2} \end{bmatrix} \end{array}$$

When $m=n$ and the matrix of coefficients $A$ is invertible, the solution to the system $Ax=b$ is given by:

$$x^{*} = A^{-1} b$$

5.2. Under-Determined Linear Systems¶

An under-determined system is one in which the number of equations is less than the number of unknowns ($m < n$). Such systems typically do not have a unique solution - instead, they admit infinitely many solutions.

System of Linear Equations

Consider the equation:

$$2x_1 + 3x_2 = 7 $$

This is a single equation in two unknowns, which defines a line in $\mathbb{R}^2$. Every point $(x_1, x_2)$ that lies on this line satisfies the equation, so the solution set is infinite.

Geometric Interpretation

Matrix Form

$$\begin{array}{c}\ a_{11}x_1 + a_{12}x_2 = b_1 \end{array} \quad \implies \quad \begin{array}{l} \begin{bmatrix} a_{11} & a_{12} \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = b_{1}\\ \end{array}$$

A general under-determined system can be written as:

$$Ax = b, \qquad \text{where } \; A \in \mathbb{R}^{m \times n},\; m < n$$

When the matrix $A$ is fat (i.e., it has more columns than rows), the system typically has infinitely many solutions.

5.3. Over-Determined Linear Systems¶

An over-determined linear system is one in which the number of equations exceeds the number of unknowns ($ m > n$).

System of Linear Equations

$$\begin{array}{c}\ 2x_1 + 3x_2 & = 7\\ x_1 + 4x_2 & = 6\\ x_1 + x_2 & = 4 \end{array} \quad \implies \quad \begin{array}{l}\begin{align*} \text{No solutions} \end{align*}\end{array}$$

Here, three equations attempt to constrain only two variables. In most cases, such a system is inconsistent, meaning there is no solution that satisfies all equations simultaneously.

Geometric Interpretation

Matrix Form

$$\begin{array}{c}\ a_{11}x_1 + a_{12}x_2 = b_1\\ a_{21}x_1 + a_{22}x_2 = b_2\\ a_{31}x_1 + a_{32}x_2 = b_3\\ \end{array} \qquad \implies \qquad \begin{array}{l} \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\\ a_{31} & a_{32}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = \begin{bmatrix} b_{1}\\ b_{2}\\ b_{3} \end{bmatrix} \end{array}$$

The system can be written compactly as:

$$Ax = b$$

When $A$ is skinny (more rows than columns), the system generally has no exact solution, unless the equations happen to be perfectly consistent - a rare case in practice.

5.4. Summary of Linear Systems¶

$$Ax = b$$

Square: Well-determined Linear Systems

$$ \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = \begin{bmatrix} b_{1}\\ b_{2}\\ \end{bmatrix} $$

Fat: Under-determined Linear Systems

$$ \begin{bmatrix} a_{11} & a_{12}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}=b_1 $$

Skinny: Over-determined Linear Systems

$$ \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\\ a_{31} & a_{32}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = \begin{bmatrix} b_{1}\\ b_{2}\\ b_{3}\\ \end{bmatrix} $$

6. Solution of a System of Linear Equations¶

6.1. Least-Norm Solution¶

Applicable to under-determined linear systems, where the number of unknowns exceeds the number of equations:

$$m < n$$

Consider the linear system:

$$ \begin{bmatrix} a_{11}&a_{12}\\ \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} = b_1 \quad\text{ or }\quad Ax = b$$

Since the system is under-determined, there are infinitely many solutions.

Optimization Objective

To select a unique solution from this infinite set, we impose an additional criterion:

Find the solution that minimizes the Euclidean norm (or its square), i.e., the solution closest to the origin.

This leads to the following constrained optimization problem:

$$\begin{align*} \min \;\; & \lVert x \rVert ^2\\ \text{subject to} \;\; & Ax = b \end{align*}$$

This is known as the least-norm solution.

Geometric Interpretation

Among the infinitely many possible solutions, it chooses the one closest to the origin in Euclidean space.
Geometrically, this corresponds to orthogonally projecting the origin onto the solution subspace.
This solution is especially relevant in applications where one seeks the minimum-energy, minimum-effort, or most parsimonious solution.

Closed-Form Expression for the Least-Norm Solution

$$x^{*} = A^T \left( AA^T \right)^{-1}b $$

This approach is common in control systems, where one often seeks a control input with minimum energy or least actuator effort that still achieves the desired output.

6.2. Least-Squares Solution¶

The least-squares solution is applied to over-determined systems - linear systems where the number of equations exceeds the number of unknowns:

$$m > n$$

Such systems typically have no exact solution, especially when the equations are inconsistent due to noise, measurement errors, or redundancy. In these cases, we seek an approximate solution that minimizes the discrepancy between the left and right sides of the system.

Given the linear system:

$$ \begin{align*} \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\\ a_{31} & a_{32} \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix} &\neq \begin{bmatrix} b_{1}\\ b_{2}\\ b_{3} \end{bmatrix} \quad\text{ or }\quad Ax \neq b \\ \\ x_1 \begin{bmatrix} a_{11} \\ a_{21} \\ a_{31} \end{bmatrix} + x_2 \begin{bmatrix} a_{12} \\ a_{22} \\ a_{32} \end{bmatrix} &\neq \begin{bmatrix} b_{1}\\ b_{2}\\ b_{3} \end{bmatrix} \end{align*} $$

This system has no exact solution, but we can compute an approximate solution $x^*$ that minimizes the residual error:

$$\lVert e \rVert = \lVert Ax-b \rVert$$

Optimization Objective

We define the least-squares problem as:

$$\min_{x}{\lVert e \rVert}^2 = \min_{x}{\lVert Ax - b\rVert}^2$$

The solution that minimizes this expression is given by:

$$x^{*} = \left( A^TA \right)^{-1}A^Tb$$

The corresponding approximation of $b$ is:

$$b^{*} = Ax^{*} = A\left( A^TA \right)^{-1}A^Tb$$

The least-squares solution is commonly used in estimation problems, such as in linear regression, system identification, and data fitting, where an exact match to the observed data is not possible, and an approximate but optimal solution is desired.

6.3. Geometric Point of View: Projection¶

From a geometric perspective, the least-squares solution of a linear system is fundamentally connected to the concept of orthogonal projection onto a subspace. To build a clear intuition for this idea, it is helpful to first examine the simplest case - the projection of one vector onto another. This forms the foundation for understanding projection onto subspaces, which plays a central role in linear algebra and its many applications.

6.3.1. Vector Projection¶

The vector projection of a vector $\vec x$ on (or onto) a nonzero vector $\vec y$ is the orthogonal projection of $\vec x$ onto a straight line parallel to $\vec y$.

Algebraic Derivation

Let:

$$\vec \omega = \omega \hat{y} = \omega \frac{\vec y}{\lVert \vec y \rVert}$$

where $ \omega = \lVert \vec \omega \rVert $ is the scalar projection of $ \vec x $ onto $ \vec y $, and $ \hat{y} $ is the unit vector in the direction of $ \vec y $.

The scalar component $ \omega $ is given by:

$$\omega = \lVert \vec x \rVert \cos\theta = \frac{ \vec x \cdot \vec y}{\lVert \vec y \rVert}$$

Therefore, the full vector projection is:

$$\vec \omega = \frac{\vec x \cdot \vec y}{\lVert \vec y \rVert^2} \vec y = \frac{\vec x^T \vec y}{\vec y^T \vec y} \vec y$$

Using inner product notation:

$$\vec \omega = \frac{\langle \vec x, \vec y \rangle}{\langle \vec y, \vec y \rangle} \vec y$$

Matrix Interpretation

$$\vec \omega = \frac{\langle \vec x, \vec y \rangle}{\langle \vec y, \vec y \rangle} \vec y = \vec y \frac{\vec x^T \vec y}{\vec y^T \vec y} = \vec y\frac{\vec y^T \vec x}{\vec y^T \vec y} = \frac{\vec y \vec y^T}{\vec y^T \vec y} \vec x$$

This projection can also be written in the form:

$$\vec \omega = P \vec x \quad \text{where} \quad P = \frac{\vec y \vec y^T}{\vec y^T \vec y}$$

Here, $ P $ is the projection matrix that maps $ \vec x $ onto the line defined by $ \vec y $.

Alternative Derivation via Orthogonality Condition

An elegant and insightful approach to computing the projection relies on the orthogonality condition:

The error vector $ \vec \omega - \vec x $ is orthogonal to $ \vec y $

Mathematically:

$$\vec y \perp (\vec \omega - \vec x ) \quad \Rightarrow \quad \vec y^T ( \vec \omega - \vec x ) = 0$$

Substituting $ \vec \omega = \omega \frac{\vec y}{\lVert \vec y \rVert} $, we have:

$$\vec y^T \left( \omega \frac{\vec y}{\lVert \vec y \rVert} - \vec x \right) = 0 \quad \Rightarrow \quad \omega = \frac{\vec y^T \vec x}{\vec y^T \vec y} \lVert \vec y \rVert$$

Substituting back:

$$\vec \omega = \omega \frac{\vec y}{\lVert \vec y \rVert} = \frac{\vec y^T \vec x}{\vec y^T \vec y} \vec y = \frac{\langle \vec x, \vec y \rangle}{\langle \vec y, \vec y \rangle} \vec y$$

This confirms the same projection formula, now derived through a geometrically motivated orthogonality condition.

This idea extends naturally to projection onto subspaces, which underlies the least-squares solution in over-determined systems (to be developed in Section 6.3.2).

import numpy as  np

X = np.matrix([[1],[1]])
Y = np.matrix([[2],[0]])

print(X)
print(Y)

[[1]
 [1]]
[[2]
 [0]]

print(Y.T*Y)

[[4]]

omega = (X.T*Y)/(Y.T*Y)

print(float(omega))

0.5

omega = float(omega)
W = omega*Y
print(W)

[[ 1.]
 [ 0.]]

6.3.2. Orthogonal Projection onto a Subspace¶

Building on the insight from the previous example, we now extend the concept of projection to a subspace.

Consider the projection of a vector $\vec b$ onto a subspace $U$ spanned by the vectors $\vec a_1$ and $\vec a_2$.
The goal is to find the point $A \vec x^*$ in the subspace that is closest to $\vec b$ in terms of Euclidean distance.

Orthogonality Condition

The residual vector $A \vec x^* - b$ must be orthogonal to every vector in the subspace:

$$ \begin{align*} \vec a_1 &\perp \left( A \vec x^* - \vec b \right) \\ \vec a_2 &\perp \left( A \vec x^* - \vec b \right) \\\\ A = \begin{bmatrix} \vec a_1 & \vec a_2 \end{bmatrix} &\perp \left( A \vec x^* - \vec b \right) \end{align*} $$

This leads to the normal equations:

$$ \begin{align*} A^T(A \vec x^* - \vec b) &= 0 \\ A^T A \vec x^* &= A^T \vec b \\ \vec x^* &= (A^T A)^{-1} A^T \vec b \\\\ \vec b^* = A \vec x^* &= A (A^T A)^{-1} A^T \vec b \end{align*} $$

import numpy as  np

A = np.matrix([[1,0],[0,1],[0,0]])
B = np.matrix([[1],[1],[1]])

X = (A.T*A).I*A.T*B
print(X)

Bstar = A*X
print(Bstar)

[[ 1.]
 [ 1.]]
[[ 1.]
 [ 1.]
 [ 0.]]

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')