Linear Algebra
Table of Contents
from IPython.display import YouTubeVideo
YouTubeVideo('ub4oe2im8T8', width = "560", height = "315")
Consider a system of two linear equations with two unknowns:
This system can be written compactly in vector-matrix form as $Ax = b$, where:
To solve for $x$, we apply the inverse of matrix $A$ (assuming it exists):
We won't compute $A^{-1}$ by hand. Instead, we will use numpy
in Python to compute the solution efficiently and accurately.
import numpy as np
A = np.array([[4, -5], [-2, 3]])
print(A)
b = np.array([[-13], [9]])
print(b)
$A^{-1} b$
x = np.linalg.inv(A).dot(b)
print(x)
If you are not familiar with linalg
, covert the data into a matrix form using asmatrix
A = np.asmatrix(A)
b = np.asmatrix(b)
x = A.I*b
print(x)
Consider a general system of $m$ linear equations in $n$ unknowns:
This system can be concisely expressed in matrix-vector form as:
Where
This compact representation is fundamental in linear algebra, enabling efficient computation and deeper theoretical analysis using matrix operations.
A matrix is more than just an array of numbers - it is a structured representation of a linear transformation. There are two ways to view a matrix: as a collection of column vectors or row vectors.
(1) Matrix as a collection of columns
(2) Matrix as a collection of rows
Understanding both views is essential in linear algebra, especially when dealing with matrix operations such as multiplication, projection, rank, and transformations in both column space and row space.
Definition of inner product: $x, y \in \mathbb{R}^{n}$
The inner product operator takes two column vectors as input and produces a scalar value as output.
The summation $\sum\limits_{i=1}^{n}x_{i}\,y_{i}$ calculates the product of corresponding components of $ x $ and $ y $ and sums them up.
The inner product can be thought of as a measure of similarity between two column vectors. We will discuss this in more detail later.
In function spaces, the inner product extends to integrals:
x = np.array([[1], [1]])
y = np.array([[2], [3]])
print(x.T.dot(y))
x = np.asmatrix(x)
y = np.asmatrix(y)
print(x.T*y)
Given $A \in \mathbb{R}^{m \times n}$, we multiply $A$ and $x$:
which means the output is an $m$-dimensional vector.
Intuition
(1) Writing $A$ by rows, each entry of $Ax$ is an inner product between $x$ and a row of $A$
(2) Writing $A$ by columns, $Ax$ is a linear combination of the columns of $A$, with coefficients given by $x$
Question: We know how to compute the distance between two real numbers. Can we extend this concept to define the distance between two vectors?
Norm provides a general concept of distance in linear algebra. A norm is a function that assigns a non-negative length (or size) to a vector, helping us define "distance" in different ways.
A vector norm is any function $f : \mathbb{R}^{n} \rightarrow \mathbb{R}$ with
(1) Non-negativity: $f(x) \geq 0 \;$ and $\;f(x) = 0 \quad \Longleftrightarrow \quad x = 0$
(2) Scalar multiplication: $f(ax) = \lvert a \rvert f(x) \;$ for $\; a \in \mathbb{R}$
(3) Triangle inequality: $f(x + y) \leq f(x) + f(y)$
Question: Can we define the distance between two matrices?
x = np.array([[4], [3]])
np.linalg.norm(x, 2)
np.linalg.norm(x, 1)
Two vectors $x, y \in \mathbb{R}^n$ are orthogonal if
They are orthonormal if, in addition,
The concept of orthogonality is naturally defined in Euclidean space. However, can this idea be extended to functional spaces? In other words, can you conceptualize a notion of orthogonality between two functions?
x = np.matrix([[1],[2]])
y = np.matrix([[2],[-1]])
x.T*y
Cauchy-Schwarz Inequality for any $x, y \in \mathbb{R}^n$:
Then, angle between vectors in $\mathbb{R}^n$ is defined as
Note on Angles, Distance, and Similarity:
Halfspaces and Classification:
The set
defines a halfspace in $\mathbb{R}^n$
The boundary is the hyperplane $x^Ty=0$, which passes through the origin.
The vector $y$ acts as the outward normal to this hyperplane - it's perpendicular to the boundary and determines which side is "positive" or "negative."
Given a vector $y$, a linear space can be divided into two halves. This property plays a fundamental role in classification, where it is used to separate data points into distinct categories. We will explore this concept in more detail later.
from IPython.display import YouTubeVideo
YouTubeVideo('-JFdkgJrkyc', width = "560", height = "315")
Matrix and Transformation
In the equation
$$\vec y = M \vec x$$Interpreting Matrices and Transformations:
If you're given a linear transformation, you can represent it with a matrix.
If you're given a matrix, it tells you how vectors get transformed (linearly).
A linear transformation is a function $T: X \rightarrow Y$ between two vector spaces $X$ and $Y$ over the same field that satisfies two properties: superposition and homogeneity.
Common examples of linear transformations include:
Linear vs. Non-linear functions
$$\begin{array}{l}\\ \text{linear}& & &\text{non-linear}\\ &\\ f(x) = 0 & & & f(x) = x + c\\ f(x) = kx & & & f(x) = x^2\\ f(x(t)) = \frac{dx(t)}{dt} & & & f(x) = \sin x\\ f(x(t)) = \int_{a}^{b} x(t)dt & & \end{array}$$The function $f(x)=x+c$ is not strictly linear, as it does not satisfy the condition $f(0)=0$. Instead, it is classified as an affine transformation, which is a more general form of linear mapping that includes a translation component.
Let's find matrix $R(\theta)$
Consider a linear transformation represented by a matrix $M$. When applied to a set of input vectors $\vec x_1$ and $\vec x_2$, the transformation produces corresponding output vectors $\vec y_1$ and $\vec y_2$, respectively:
This relationship can be compactly expressed in matrix form as:
This formulation emphasizes that a linear transformation applied to multiple vectors is equivalent to applying the matrix to a matrix whose columns consist of those vectors. The result is a matrix whose columns are the corresponding transformed vectors.
Notes:
The method used to derive $R(\theta)$ here differs somewhat from what we have learned. However, it is a more direct and conceptually intuitive means of finding $R(\theta)$ based on its definition of rotation.
As shown, rotation is a linear transformation because it can be represented by a matrix $M$. However, at first glance, rotation may not intuitively appear to be linear. Nonetheless, since it satisfies the properties of additivity and homogeneity, rotation is indeed a linear transformation.
import numpy as np
theta = 90/180*np.pi
R = np.matrix([[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]])
x = np.matrix([[1],[0]])
y = R*x
print(y)
A fundamental class of linear transformations involves scaling a vector - either stretching or compressing it - without altering its direction. This transformation is defined by multiplying the vector by a scalar $k$:
This operation simply scales the magnitude (length) of the vector $\vec{x}$ while maintaining its orientation (unless $k < 0$, which also reverses the direction).
To express this scaling transformation in matrix form, we can write:
where $I$ is the identity matrix, and the scalar multiplication is applied uniformly across all dimensions. For example, in two dimensions:
Example
Let $T$ be a linear transformation that:
We are to compute the corresponding matrix $A$ such that:
Solution:
More importantly, by looking at $A = \begin{bmatrix}a & 0\\0 & b\end{bmatrix}$, can you determine what kind of corresponding linear transformation $T$ represents?
$P$: Projection onto $\hat x$ - axis
$$
\begin{array}\\
P \begin{bmatrix} 1 \\ 0 \end{bmatrix} & = \begin{bmatrix} 1 \\ 0 \end{bmatrix}\\
P \begin{bmatrix} 0 \\ 1 \end{bmatrix} & = \begin{bmatrix} 0 \\ 0 \end{bmatrix}\\\\
P \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} & = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}
\end{array}
$$
Again, at first glance, projection may not intuitively appear to be linear. Nonetheless, since it satisfies the properties of additivity and homogeneity, projection is indeed a linear transformation.
Question: Provide a explanation, based on the definition of projection, for why the matrix $P$ is not invertible.
import numpy as np
P = np.matrix([[1, 0],
[0, 0]])
x = np.matrix([[1],[1]])
y = P*x
print(y)
If $\vec {v}_1$ and $\vec {v}_2$ are basis, and we know
Then, for any $\vec x$, we can uniquely express it as a linear combination of the basis vectors:
Since $T$ is linear, it follows that:
Explanation and Computational Significance
Suppose that computing $T(\vec{x})$ for a new input $\vec{x}$ directly takes approximately 2 hours. However, if $\vec{\omega}_1 $ and $ \vec{\omega}_2 $ have already been precomputed offline, the output for a new input $\vec x $ can be obtained immediately.
Rather than computing $T(\vec{x})$ from scratch (a process that takes 2 hours), the transformation can be rapidly evaluated by simply taking the linear combination
where the coefficients $a_1$ and $a_2$ are quickly and uniquely determined from the representation of $\vec{x}$ in the chosen basis.
This approach significantly reduces the computational overhead by leveraging the precomputed outputs.
This is why a linear system greatly simplifies problem-solving.
The key insight is that we only need to observe how the basis vectors are transformed linearly.
Eigen-analysis, which involves the study of eigenvalues and eigenvectors, plays a fundamental role in numerous areas of science and engineering. Its frequent appearance across various domains is not incidental but rather a consequence of its deep mathematical and physical significance. Eigen-analysis provides insights into the inherent characteristics of linear transformations.
Fundamental Equation
Where
Intuition and Interpretation
An eigenvector is a special vector that does not change direction under the transformation defined by $ A $. Instead, it is only scaled by a factor $ \lambda $:
In this sense:
Deeper Insight: Invariance and Change
At a high level, to understand the dynamics of $A$ (or the transformation represented by $A$), it is crucial to identify and analyze its invariants. In other words, to comprehend how things change, we must first understand what remains unchanged. Eigenvectors serve this purpose by revealing the directions that remain unaltered under transformation, while eigenvalues indicate the scaling effect along these directions. This fundamental principle underlies many applications in physics, engineering, and data science, where understanding invariant structures provides deeper insights into system behavior and stability.
Or more interpretively:
"While things compete and evolve, their fundamental nature remains unchanged."
This idiom conveys the idea that, despite external competition and transformation, there are underlying principles or essences that remain constant. It aligns well with the notion that to understand change, one must first recognize what is invariant.
If $\vec {v}_1$ and $\vec {v}_2$ are basis and eigenvectors, and we know
Then, for any $\vec x$
Only thing that we need is to observe how each basis is independently scaled
To find the eigenvalues and eigenvectors of a matrix $A$, we begin with the defining equation of eigen-analysis:
This can be rewritten as:
This is a homogeneous system of equations. For nontrivial solutions $\vec v \neq \vec 0$ to exist, the matrix $(A-\lambda I)$ must be singular - i.e., it must not have an inverse. Hence, the condition for $\lambda$ to be an eigenvalue is:
This equation is known as the characteristic equation of the matrix $A$.
Alternative (Geometric) Perspective
Beyond the algebraic procedure, eigenvectors and eigenvalues can also be understood directly from the definition of eigen-analysis.
Given a matrix $A$, consider the corresponding linear transformation it represents. From this linear transformation, we can identify eigenvectors, which are direction-invariant vectors - meaning they remain in the same direction after the transformation, though they may be scaled by an eigenvalue.
Through the following examples, we will see how eigenvectors remain direction-invariant under a given linear transformation and how eigenvalues determine their scaling behavior.
$A = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}$ : projection onto $\hat x$- axis. Find eigenvalues and eigenvectors.
If $\vec x = \begin{bmatrix} 0 \\ 1 \end{bmatrix} $,
If $\vec x = \begin{bmatrix} 1 \\ 0 \end{bmatrix} $,
import numpy as np
A = np.array([[1, 0],
[0, 0]])
D, V = np.linalg.eig(A)
print('D =', D)
print('\nV =')
print(V)
For any $\vec x$ in the plane, $\; P\vec x = \vec x \; \implies \; \lambda = 1$
For any $\vec x$ perpendicular to the plane, $\; P\vec x = \vec 0 \; \implies \; \lambda = 0$
Eigen-analysis of a rotation transformation provides insight into how vectors behave under rotation in a given vector space. This analysis involves finding eigenvalues and eigenvectors of a rotation matrix, which describe the invariant directions and scaling factors of the transformation.
Rotation Matrix in 2D
A rotation matrix in two-dimensional space is given by:
where $\theta$ is the angle of rotation.
The eigenvalues of $ R(\theta) $ are found by solving the characteristic equation:
This gives the two complex eigenvalues:
Since these eigenvalues have unit magnitude ($ |\lambda| = 1 $), they confirm that rotation preserves vector magnitudes, only altering directions.
To find the eigenvectors, solve:
For $ \lambda_1 = e^{i\theta} $ and $ \lambda_2 = e^{-i\theta} $, the corresponding eigenvectors are complex and take the form:
Since these eigenvectors are not real, it indicates that a pure rotation in 2D has no real eigenvectors, meaning no real direction remains invariant.
Generalization to 3D Rotation
In three dimensions, rotation matrices depend on an axis of rotation. A typical rotation around the $ z $-axis is:
Eigenvalues:
Eigenvectors:
The eigenvector corresponding to $ \lambda_3 = 1 $ is $ \vec{v}_3 = (0, 0, 1)^T $, indicating that the rotation leaves the $ z $-axis invariant.
The other two eigenvectors correspond to complex plane rotations in the $ xy $-plane.
Depending on the number of equations relative to the number of unknowns, we classify such systems into three types:
from IPython.display import YouTubeVideo
YouTubeVideo('qHOwNxQIfpo', width = "560", height = "315")
A well-determined linear system is one in which the number of equations is equal to the number of unknowns ($m=n$).
System of Linear Equations
Consider the following system with two equations and two unknowns:
This solution satisfies both equations simultaneously.
Geometric Interpretation
From a geometric perspective, each equation represents a line in the two-dimensional plane $ \mathbb{R}^2 $. The solution $ (x_1, x_2)^* = (2, 1) $ corresponds to the point of intersection of these two lines.
Matrix Form
When $m=n$ and the matrix of coefficients $A$ is invertible, the solution to the system $Ax=b$ is given by:
An under-determined system is one in which the number of equations is less than the number of unknowns ($m < n$). Such systems typically do not have a unique solution - instead, they admit infinitely many solutions.
System of Linear Equations
Consider the equation:
This is a single equation in two unknowns, which defines a line in $\mathbb{R}^2$. Every point $(x_1, x_2)$ that lies on this line satisfies the equation, so the solution set is infinite.
Geometric Interpretation
Matrix Form
A general under-determined system can be written as:
When the matrix $A$ is fat (i.e., it has more columns than rows), the system typically has infinitely many solutions.
An over-determined linear system is one in which the number of equations exceeds the number of unknowns ($ m > n$).
System of Linear Equations
Here, three equations attempt to constrain only two variables. In most cases, such a system is inconsistent, meaning there is no solution that satisfies all equations simultaneously.
Geometric Interpretation
Matrix Form
The system can be written compactly as:
When $A$ is skinny (more rows than columns), the system generally has no exact solution, unless the equations happen to be perfectly consistent - a rare case in practice.
Applicable to under-determined linear systems, where the number of unknowns exceeds the number of equations:
Consider the linear system:
Since the system is under-determined, there are infinitely many solutions.
Optimization Objective
To select a unique solution from this infinite set, we impose an additional criterion:
This leads to the following constrained optimization problem:
This is known as the least-norm solution.
Geometric Interpretation
Closed-Form Expression for the Least-Norm Solution
This approach is common in control systems, where one often seeks a control input with minimum energy or least actuator effort that still achieves the desired output.
The least-squares solution is applied to over-determined systems - linear systems where the number of equations exceeds the number of unknowns:
Such systems typically have no exact solution, especially when the equations are inconsistent due to noise, measurement errors, or redundancy. In these cases, we seek an approximate solution that minimizes the discrepancy between the left and right sides of the system.
Given the linear system:
This system has no exact solution, but we can compute an approximate solution $x^*$ that minimizes the residual error:
Optimization Objective
We define the least-squares problem as:
The solution that minimizes this expression is given by:
The corresponding approximation of $b$ is:
The least-squares solution is commonly used in estimation problems, such as in linear regression, system identification, and data fitting, where an exact match to the observed data is not possible, and an approximate but optimal solution is desired.
From a geometric perspective, the least-squares solution of a linear system is fundamentally connected to the concept of orthogonal projection onto a subspace. To build a clear intuition for this idea, it is helpful to first examine the simplest case - the projection of one vector onto another. This forms the foundation for understanding projection onto subspaces, which plays a central role in linear algebra and its many applications.
The vector projection of a vector $\vec x$ on (or onto) a nonzero vector $\vec y$ is the orthogonal projection of $\vec x$ onto a straight line parallel to $\vec y$.
Algebraic Derivation
Let:
where $ \omega = \lVert \vec \omega \rVert $ is the scalar projection of $ \vec x $ onto $ \vec y $, and $ \hat{y} $ is the unit vector in the direction of $ \vec y $.
The scalar component $ \omega $ is given by:
Therefore, the full vector projection is:
Using inner product notation:
Matrix Interpretation
This projection can also be written in the form:
Here, $ P $ is the projection matrix that maps $ \vec x $ onto the line defined by $ \vec y $.
Alternative Derivation via Orthogonality Condition
An elegant and insightful approach to computing the projection relies on the orthogonality condition:
The error vector $ \vec \omega - \vec x $ is orthogonal to $ \vec y $
Mathematically:
Substituting $ \vec \omega = \omega \frac{\vec y}{\lVert \vec y \rVert} $, we have:
$$\vec y^T \left( \omega \frac{\vec y}{\lVert \vec y \rVert} - \vec x \right) = 0 \quad \Rightarrow \quad \omega = \frac{\vec y^T \vec x}{\vec y^T \vec y} \lVert \vec y \rVert$$Substituting back:
This confirms the same projection formula, now derived through a geometrically motivated orthogonality condition.
This idea extends naturally to projection onto subspaces, which underlies the least-squares solution in over-determined systems (to be developed in Section 6.3.2).
import numpy as np
X = np.matrix([[1],[1]])
Y = np.matrix([[2],[0]])
print(X)
print(Y)
print(Y.T*Y)
omega = (X.T*Y)/(Y.T*Y)
print(float(omega))
omega = float(omega)
W = omega*Y
print(W)
Building on the insight from the previous example, we now extend the concept of projection to a subspace.
Consider the projection of a vector $\vec b$ onto a subspace $U$ spanned by the vectors $\vec a_1$ and $\vec a_2$.
The goal is to find the point $A \vec x^*$ in the subspace that is closest to $\vec b$ in terms of Euclidean distance.
Orthogonality Condition
The residual vector $A \vec x^* - b$ must be orthogonal to every vector in the subspace:
This leads to the normal equations:
import numpy as np
A = np.matrix([[1,0],[0,1],[0,0]])
B = np.matrix([[1],[1],[1]])
X = (A.T*A).I*A.T*B
print(X)
Bstar = A*X
print(Bstar)
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')