Ellipse and Gaussian Distribution

Table of Contents

1. Coordinates¶

Basis and Coordinate Representations

Any vector $\vec{r}$ in a vector space can be represented with respect to different bases. For example, consider the following two bases:

$\text{basis } \{\hat{x}_1,\; \hat{x}_2\} \quad \text{or} \quad \text{basis } \{\hat{y}_1,\; \hat{y}_2\}$

Then, the vector $\vec{r}$ can be expressed as coordinate vectors in these bases:

$\begin{align*} \vec{r}_X &= \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \quad \text{: coordinates in basis } \{\hat{x}_1,\; \hat{x}_2\} \\\\ \vec{r}_Y &= \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \quad \text{: coordinates in basis } \{\hat{y}_1,\; \hat{y}_2\} \end{align*}$

Although $\vec{r}_X$ and $\vec{r}_Y$ contain different values, they both represent the same geometric vector $\vec{r}$ .

Coordinate Transformation

The same vector $\vec{r}$ can be written in either basis:

$\vec{r} = x_1 \hat{x}_1 + x_2 \hat{x}_2 = y_1 \hat{y}_1 + y_2 \hat{y}_2$

We can rewrite this relationship in matrix form:

$\begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} \hat{y}_1 & \hat{y}_2 \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}$

Let:

$U = \begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix}, \quad I = \begin{bmatrix} \hat{y}_1 & \hat{y}_2 \end{bmatrix} \; (= \text{identity})$

Assuming ${\hat{y}_1, \hat{y}_2}$ is the standard basis, we interpret $I$ as the identity matrix. Then the transformation becomes:

$U \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \quad \Rightarrow \quad \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = U^{-1} \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}$

If $U$ is orthogonal (i.e., the basis vectors $\hat{x}_1, \hat{x}_2$ are orthonormal), then $U^{-1} = U^T$ , and we get:

$\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = U^{-1} \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = U^T \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}$

Coordinate Change Summary

The transformation between coordinate representations can be summarized as:

$\begin{align*} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} &\quad &\xrightarrow{U} \quad &\begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \\\\ \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} &\quad &\xrightarrow{U^T} \quad &\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \end{align*}$

This framework is fundamental in linear algebra and geometry, especially in understanding transformations such as rotations, projections, and changes of basis.

2. Ellipse¶

Why Study Ellipses?

In this chapter, our primary goal is to understand the Gaussian distribution, a fundamental building block in probability, statistics, and machine learning. However, before we formally define the Gaussian distribution, we will first study the geometry of ellipses.

Why? Because the level sets (or contours of constant density) of a multivariate Gaussian distribution correspond to points $x$ that satisfy the following quadratic form:

$(x - \mu)^T \Sigma^{-1} (x - \mu) = c$

for some constant $c > 0$ . This is precisely the equation of an ellipse. These level sets capture the geometry of uncertainty in a Gaussian model and are crucial for visualizing, interpreting, and working with Gaussian distributions.

By understanding ellipses first, we build the necessary geometric intuition to fully grasp the shape and structure of Gaussian distributions.

2.1. Equation of an Ellipse¶

(1) Unit Circle

The standard unit circle in $\mathbb{R}^2$ is defined by:

$x_1^2 + x_2^2 = 1 \implies \begin{bmatrix}x_1 & x_2 \end{bmatrix} \begin{bmatrix}1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = 1$

$x^T x = 1$

(2) Independent Ellipse (Axis-Aligned)

An axis-aligned ellipse centered at the origin has the form:

$\begin{align*} \frac{x_1^2}{a^2} + \frac{x_2^2}{b^2} = 1 &\implies \begin{bmatrix}x_1 & x_2 \end{bmatrix} \begin{bmatrix}\frac{1}{a^2} & 0 \\ 0 & \frac{1}{b^2} \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = 1 \\\\ & \implies \begin{bmatrix}x_1 & x_2 \end{bmatrix} \Sigma_x^{-1} \begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = x^T \Sigma_x^{-1} x = 1\\ \\ &\text{where} \; \Sigma_x^{-1} =\begin{bmatrix}\frac{1}{a^2} & 0\\ 0 & \frac{1}{b^2} \end{bmatrix}, \,\Sigma_x = \begin{bmatrix}a^2 & 0\\ 0 & b^2 \end{bmatrix} \end{align*}$

This ellipse is aligned with the coordinate axes $\{\hat{x}_1, \hat{x}_2\}$ and stretches along them by $a$ and $b$ respectively.

(3) Dependent ellipse (Rotated ellipse)

Coordinate changes

To express the same ellipse under a rotated coordinate system $\{\hat{y}_1, \hat{y}_2\}$ , we use a change of coordinates via an orthogonal matrix $U$ :

$\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = U^T \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \quad \text{or equivalently} \quad x = U^T y,\; y = Ux$

This transforms the coordinate system, rotating the ellipse while preserving distances and angles.

Ellipse Equation in Original Coordinates

We start from the standard ellipse equation in $x$ -coordinates:

$x^T \Sigma_x^{-1} x = 1$

where $\Sigma_x$ is a symmetric positive-definite matrix representing the shape and scale of the ellipse aligned with the coordinate axes.

Transforming to $y$ -Coordinates

To express this ellipse in the rotated $y$ -coordinate system, substitute $x = U^T y$ into the equation:

$(U^T y)^T \Sigma_x^{-1} (U^T y) = y^T U \Sigma_x^{-1} U^T y = 1$

We define the new inverse covariance in $y$ -coordinates as:

$\Sigma_y^{-1} = U \Sigma_x^{-1} U^T$

Thus, the ellipse equation in $y$ -coordinates becomes:

$y^T \Sigma_y^{-1} y = 1$

Final Form of the Rotated Ellipse

Since the inverse of $\Sigma_y^{-1}$ is:

$\Sigma_y = U \Sigma_x U^T$

the equation of the rotated ellipse is:

$y^T \Sigma_y^{-1} y = 1 \quad \text{with} \quad \Sigma_y = U \Sigma_x U^T$

This expresses the same ellipse, but now in a rotated (dependent) coordinate system. The axes of the ellipse are no longer aligned with the coordinate axes unless $\Sigma_y$ is diagonal.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

theta = np.arange(0, 2*np.pi, 0.01)
x1 = np.cos(theta)
x2 = np.sin(theta)

plt.figure(figsize=(4, 4))
plt.plot(x1, x2)
plt.xlim([-4, 4])
plt.ylim([-4, 4])
plt.show()

x1 = 3*np.cos(theta);
x2 = np.sin(theta);

plt.figure(figsize = (4, 4))
plt.plot(x1, x2)
plt.xlim([-4, 4])
plt.ylim([-4, 4])
plt.show()

$U = \left[\hat x_1 \;\hat x_2 \right] = \begin{bmatrix}\frac{1}{2} & -\frac{\sqrt{3}}{2}\\ \frac{\sqrt{3}}{2} & \frac{1}{2} \end{bmatrix}$

U = np.matrix([[1/2, -np.sqrt(3)/2],
              [np.sqrt(3)/2, 1/2]])

x = np.matrix([x1, x2])

y = U*x
print("U = ", U, "\n")

plt.figure(figsize = (4, 4))
plt.plot(y[0,:].T, y[1,:].T)
plt.xlim([-4, 4])
plt.ylim([-4, 4])
plt.show()

U =  [[ 0.5       -0.8660254]
 [ 0.8660254  0.5      ]]

Sx = np.matrix([[9, 0],
               [0, 1]])

Sy = U*Sx*U.T

print ("Sx = \n", Sx, "\n")
print ("Sy = \n", Sy)

Sx = 
 [[9 0]
 [0 1]] 

Sy = 
 [[3.         3.46410162]
 [3.46410162 7.        ]]

2.2. From Covariance Matrix to Ellipse¶

So far, we have examined how to determine the covariance matrix $\Sigma$ from a given (possibly rotated) ellipse. Now we consider a more common and practically useful question: "Given a covariance matrix $\Sigma$ , how can we visualize or reconstruct the corresponding ellipse?"

To answer this, we aim to determine the following:

The major axis $a$ and minor axis $b$ of the ellipse
The diagonal covariance matrix $\Sigma_x$
The rotation matrix $U$ that orients the ellipse

The key idea: eigenvectors and eigenvalues

$A = S \Lambda S^T \qquad \text{where } S = [\upsilon_1\; \upsilon_2] \text{ are eigenvectors of } A,\; \Lambda = \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}$

In our case:

$\Sigma_y = U \Sigma_x U^T = U \Lambda U^T$

where:

$U = [\hat{x}_1\; \hat{x}_2]$ are eigenvectors of $\Sigma_y$
$\Lambda = \begin{bmatrix} a^2 & 0 \\ 0 & b^2 \end{bmatrix}$ contains eigenvalues of $\Sigma_y$

Step 1: Eigenvalue Decomposition

Suppose we are given a symmetric positive definite matrix $\Sigma_y$ .

We perform eigen-decomposition:

$\begin{align*} \Sigma_y \hat{x}_1 &= \lambda_1 \hat{x}_1 \\ \Sigma_y \hat{x}_2 &= \lambda_2 \hat{x}_2 \end{align*} \quad \Rightarrow \quad \Sigma_y U = U \Lambda \quad \Rightarrow \quad \Sigma_y = U \Lambda U^T$

$U = \begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix}$ is an orthogonal matrix whose columns are the eigenvectors of $\Sigma_y$
$\Lambda = \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}$ is the diagonal matrix of eigenvalues

Step 2: Identify Ellipse Parameters

The axes of the ellipse align with the eigenvectors in $U$
The lengths of the axes are given by the square roots of the eigenvalues

That is,

$a = \sqrt{\lambda_1}, \quad b = \sqrt{\lambda_2}$

We now define:

$\Sigma_x = \begin{bmatrix} a^2 & 0 \\ 0 & b^2 \end{bmatrix} = \Lambda$

and

$\Sigma_y = U \Sigma_x U^T$

So the original ellipse equation in $y$ -coordinates:

$y^T \Sigma_y^{-1} y = 1$

can be interpreted in the $x$ -coordinates (i.e., eigenbasis) as:

$x^T \Sigma_x^{-1} x = 1$

which is an axis-aligned ellipse.

Note that the transformation between coordinates:

$x = U^T y \quad \text{or} \quad \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = U^T \begin{bmatrix} y_1 \\ y_2 \end{bmatrix}$

D, U = np.linalg.eig(Sy)

idx = np.argsort(-D)
D = D[idx]
U = U[:,idx]

print ("D = \n", np.diag(D), "\n")
print ("U = \n", U)

D = 
 [[9. 0.]
 [0. 1.]] 

U = 
 [[-0.5       -0.8660254]
 [-0.8660254  0.5      ]]

Summary

Independent ellipse: represented in basis ${ \hat{x}_1, \hat{x}_2 }$
Dependent ellipse: represented in basis ${ \hat{y}_1, \hat{y}_2 }$

Given a covariance matrix $\Sigma_y$ , we can reconstruct the corresponding ellipse by following these steps:

Compute the eigenvalues $\lambda_1,; \lambda_2$ of $\Sigma_y$
Compute the corresponding eigenvectors and form the matrix $U$

Then the coordinate transformation is:

$\begin{align*} x &= U^T y \\\\ U &= \begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix} \end{align*}$

The lengths of the ellipse axes are given by:

$a = \sqrt{\lambda_1}, \quad b = \sqrt{\lambda_2}$

Define the diagonal covariance in the independent (rotated) coordinates:

$\Sigma_x = \begin{bmatrix} a^2 & 0 \\ 0 & b^2 \end{bmatrix}$

Then the full covariance in the original coordinates is:

$\Sigma_y = U \Sigma_x U^T$

This process reconstructs both the shape (through eigenvalues) and the orientation (through eigenvectors) of the ellipse from the given covariance matrix $\Sigma_y$ .

3. Gaussian Distribution¶

Gaussian distributions are foundational in probability, statistics, and machine learning due to their mathematical tractability and their role in modeling natural phenomena. In this section, we begin with the univariate Gaussian and build toward the multivariate case.

3.1. Standard Univariate Normal Distribution¶

The most basic form of a Gaussian distribution is the standard normal distribution, defined over a single variable $y$ . It is symmetric around zero and has unit variance:

$P_Y\left(Y=y\right)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{1}{2}y^2\right) \sim \mathcal{N}(0,1^2)$

This distribution is bell-shaped, centered at zero, and is widely used for normalization and standardization of data.

$\frac{1}{2} y^2 = \text{const} \implies \text{prob. contour}$

The exponent $\frac{1}{2} y^2 = \text{const}$ defines level sets (or contours) of equal probability density. In one dimension, these are simply intervals on the real line, but in higher dimensions, they generalize to ellipses.

y = np.arange(-10, 10, 0.1)

ProbG = 1/np.sqrt(2*np.pi)*np.exp(-1/2*y**2)

plt.figure(figsize = (6, 4))
plt.plot(y, ProbG)
plt.ylim([0, 1])
plt.show()

from scipy.stats import norm

ProbG2 = norm.pdf(y)

plt.figure(figsize = (6, 4))
plt.plot(y, ProbG2)
plt.ylim([0, 1])
plt.show()

x = np.random.randn(5000,1)

plt.figure(figsize = (6, 4))
plt.hist(x, bins = 51, density = True, alpha = 0.4)
plt.plot(y, ProbG2, label = 'G2')
plt.show()

3.2. Univariate Normal Distribution¶

The general form of a univariate Gaussian includes parameters for mean and variance:

$\mu$ : the mean, which determines the center of the distribution
$\sigma^2$ : the variance, which controls the spread or width of the distribution

The probability density function (pdf) is given by:

$\mathcal{N}(x;\,\mu,\sigma) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{1}{2}\frac{\left(x-\mu\right)^2}{\sigma^2}\right)$

This expression shows that the density decays exponentially as $x$ moves away from the mean $\mu$ . The decay rate is controlled by the variance $\sigma^2$ .

Standardization: Any Gaussian random variable $x \sim \mathcal{N}(\mu, \sigma^2)$ can be converted into a standard normal variable using:

$\begin{align*}x &\sim \mathcal{N}\left(\mu,\sigma^2\right) \\ \implies & P_Y\left(y\right) = P_X\left(x\right), \quad y = \frac{x-\mu}{\sigma}, \quad \text{or equivalently} \quad x = \sigma y + \mu \\\\ P_X\left(X=x\right) &\sim \exp\left(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right) \\ &= \exp\left(-\frac{1}{2}\frac{\left(x-\mu\right)^2}{\sigma^2}\right) \end{align*}$

The transformation maps $x$ into a standard normal variable $y \sim \mathcal{N}(0, 1)$ .

Properties:

$\begin{align*} E[x] &= \mu \\ \text{var}(x) &= \sigma^2 \end{align*}$

These identities follow from integrating the pdf and are essential in statistical estimation.

mu = 2
sigma = 3

x = np.arange(-10, 10, 0.1)

ProbG3 = 1/(np.sqrt(2*np.pi)*sigma) * np.exp(-1/2*(x-mu)**2/(sigma**2))

plt.figure(figsize = (6, 4))
plt.plot(y, ProbG, label = 'G1')
plt.plot(x, ProbG3, label = 'G3')
plt.legend()
plt.show()

ProbG2 = norm.pdf(x, 2, 3)

plt.figure(figsize = (6, 4))
plt.plot(y, ProbG, label = 'G1')
plt.plot(x, ProbG2, label = 'G2')
plt.legend()
plt.show()

x = mu + sigma*np.random.randn(5000,1)

plt.figure(figsize = (6, 4))
plt.hist(x, bins = 51, density = True, alpha = 0.4)
plt.plot(y, ProbG2, label = 'G2')
# plt.ylim([0, 0.4])
plt.show()

3.3. Multivariate Gaussian Models¶

In higher dimensions, the Gaussian distribution generalizes to random vectors. Let $x \in \mathbb{R}^n$ . Then, the multivariate normal distribution is defined as:

$\mathcal{N}(x;\, \mu, \Sigma) = \frac{1}{(2\pi)^{n/2} |\Sigma|^{1/2}} \exp\left( -\frac{1}{2} (x - \mu)^T \Sigma^{-1} (x - \mu) \right)$

$\mu \in \mathbb{R}^n$ : the mean vector
$\Sigma \in \mathbb{R}^{n \times n}$ : the covariance matrix, symmetric and positive definite
$|\Sigma|$ : the determinant of the covariance matrix
$\Sigma^{-1}$ : the precision matrix, inversely related to spread

The exponent defines an ellipsoidal contour:

$\Delta^2 = (x - \mu)^T \Sigma^{-1} (x - \mu)$

This quadratic form is constant along the boundary of probability density contours. These contours are always centered at $\mu$ and shaped according to $\Sigma$ .

3.3.1. Two Independent Variables¶

Let $X_1$ and $X_2$ be independent Gaussian random variables with their own means and variances. Then their joint distribution is:

$\begin{align*} P(X_1 = x_1, X_2 = x_2) &= P_{X_1}(x_1) \cdot P_{X_2}(x_2) \\ &\sim \exp\left( -\frac{1}{2} \left[ \frac{(x_1 - \mu_1)^2}{\sigma_1^2} + \frac{(x_2 - \mu_2)^2}{\sigma_2^2} \right] \right) \end{align*}$

In matrix form:

$x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}, \quad \mu = \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix}, \quad \Sigma = \begin{bmatrix} \sigma_1^2 & 0 \\ 0 & \sigma_2^2 \end{bmatrix}$

Then:

$P(x) \sim \exp\left( -\frac{1}{2} (x - \mu)^T \Sigma^{-1} (x - \mu) \right)$

Ellipse Interpretation:

The density contour of this bivariate Gaussian becomes:

$\begin{align*}\frac{x_1^2}{\sigma_{x_1}^2} + \frac{x_2^2}{\sigma_{x_2}^2} &= c \;\;\;\text{(ellipse)} \\ \\ \begin{bmatrix} x_1 & x_2\end{bmatrix} \begin{bmatrix} \frac{1}{\sigma_{x_1}^2} & 0 \\ 0 & \frac{1}{\sigma_{x_2}^2} \end{bmatrix} \begin{bmatrix}x_1 \\ x_2\end{bmatrix} &= c \qquad (\sigma_{x_1} < \sigma_{x_2}) \end{align*}$

This is the equation of an axis-aligned ellipse. Its shape is determined by the variances and its axes are aligned with the coordinate system.

Summary in a matrix form

$\begin{align*} \mathcal{N}\left(0,\Sigma_x\right) &\sim \exp\left(-\frac{1}{2}x^T \Sigma_x^{-1} x\right) \\\\ \mathcal{N}\left(\mu_x,\Sigma_x\right) &\sim \exp\left(-\frac{1}{2}\left(x-\mu_x\right)^T \Sigma_x^{-1} \left(x-\mu_x\right)\right) \end{align*}$

mu = np.array([0, 0])
sigma = np.eye(2)

m = 1000
x = np.random.multivariate_normal(mu, sigma, m)

plt.figure(figsize = (6, 4))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8, 8])
plt.show()

mu = np.array([0, 0])
sigma = np.array([[1, 0], [0, 4]])

m = 1000
x = np.random.multivariate_normal(mu, sigma, m)

plt.figure(figsize = (6, 4))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8, 8])
plt.show()

3.3.2. Two Dependent Variables¶

When the variables are not independent, the covariance matrix $\Sigma$ has off-diagonal terms, and the joint distribution involves a rotated ellipse.

Suppose:

$x = U^T y \quad \text{and} \quad y = Ux$

Then:

$x^T \Sigma_x^{-1} x = y^T U \Sigma_x^{-1} U^T y = y^T \Sigma_y^{-1} y$

So the transformed covariance is:

$\Sigma_y = U \Sigma_x U^T$

Affine Transform of a Gaussian:

$x \sim \mathcal{N}(\mu_x, \Sigma_x), \quad y = Ax + b$

Then:

$y \sim \mathcal{N}(A \mu_x + b, A \Sigma_x A^T)$

The result is still Gaussian. This property makes Gaussians extremely useful in probabilistic modeling, as linear operations preserve normality.

Summary:

$\begin{align*} & x \sim \mathcal{N}(\mu_x,\Sigma_x) \text{ and } y = Ax + b \; \text{affine transformation}\\ \\ \implies \; & y \sim \mathcal{N}(\mu_y,\Sigma_y) = \mathcal{N}(A\mu_x+b,A\Sigma_xA^T)\\\\ \implies \; & y \;\; \text{is also Gaussian with} \;\; \mu_y = A\mu_x+b, \;\; \Sigma_y = A \Sigma_x A^T \end{align*}$

mu = np.array([0, 0])
sigma = 1./2.*np.array([[5, -3], [-3, 5]])

m = 1000
x = np.random.multivariate_normal(mu, sigma, m)

plt.figure(figsize = (6, 4))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8,8])
plt.show()

3.4. Decouple using Covariance Matrix¶

Given a dataset assumed to follow a Gaussian distribution, we estimate the covariance matrix from data and analyze its eigenstructure.

Let $y$ be a centered random vector (i.e., $E[y] = 0$ ). Then the sample covariance matrix is:

$\Sigma_y = \begin{bmatrix} \text{var}(y_1) & \text{cov}(y_1, y_2) \\ \text{cov}(y_2, y_1) & \text{var}(y_2) \end{bmatrix}$

We now perform eigen-analysis:

$\Sigma_y \hat{x}_1 = \lambda_1 \hat{x}_1, \quad \Sigma_y \hat{x}_2 = \lambda_2 \hat{x}_2$

These eigenvectors give the directions of the major and minor axes of the data distribution, and the eigenvalues determine the length of those axes.

If:

$U = \begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix}$

Then:

$\Sigma_y = U \Sigma_x U^T$

where $\Sigma_x$ is diagonal:

$\Sigma_x = \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}$

This diagonalization decouples the Gaussian and transforms it into an axis-aligned ellipse in a new coordinate system.

$\begin{array}{Icr} \text{eigen-analysis} \\ \\\\ \Sigma_y\hat{x}_1 = \lambda_1\hat{x}_1 \\ \Sigma_y\hat{x}_2 = \lambda_2\hat{x}_2 \end{array} \quad \quad \begin{array}{Icr} \Sigma_x^{-1} = \begin{bmatrix} \frac{1}{\sqrt{\lambda_1}^2} & 0 \\ 0 & \frac{1}{\sqrt{\lambda_2}^2} \end{bmatrix} \\\\ \Sigma_x = \begin{bmatrix}\sqrt{\lambda_1}^2 & 0 \\ 0 & \sqrt{\lambda_2}^2 \end{bmatrix} \end{array}$

$\begin{array}{Icr} \begin{align*} \Sigma_y \begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix} &= \begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix} \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}\\ &= \begin{bmatrix}\hat{x}_1 & \hat{x}_2 \end{bmatrix} \Sigma_{x} \\ \\ \Sigma_y &= U \Sigma_x U^T \end{align*} \end{array} \quad \quad \quad \begin{array}{Icr} y = U x \implies U^Ty = x \\ \begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix} = U \\ \\ \\ \end{array}$

S = np.cov(x.T)
print ("S = \n", S)

S = 
 [[ 2.4683757  -1.30004182]
 [-1.30004182  2.31394541]]

D, U = np.linalg.eig(S)

idx = np.argsort(-D)
D = D[idx]
U = U[:,idx]

print ("U = \n", U, '\n')
print ("D = \n", D)

U = 
 [[ 0.72776709  0.68582437]
 [-0.68582437  0.72776709]] 

D = 
 [3.69349342 1.08882769]

xp = np.arange(-10, 10)

plt.figure(figsize = (6, 4))
plt.plot(x[:,0], x[:,1], '.')
plt.plot(xp, U[1,0]/U[0,0]*xp, label = 'u1')
plt.plot(xp, U[1,1]/U[0,1]*xp, label = 'u2')
plt.axis('equal')
plt.ylim([-8, 8])
plt.legend()
plt.show()

4. Properties of Gaussian Distribution¶

Gaussian distributions are not only widely applicable but also mathematically elegant. One of their most powerful features is their closedness under various operations: linear transformations, marginalization, and conditioning. This means that applying these operations to a Gaussian random variable results in another Gaussian.

Key properties:

Symmetric about the mean
Fully specified by mean and covariance
Uncorrelated Gaussian variables are independent

Closed under:

Linear and affine transformations
Dimension reduction: marginalization and conditioning
These properties are crucial in statistical inference and Bayesian analysis

4.1. Affine Transformation¶

Suppose $x \sim \mathcal{N}(\mu_x, \Sigma_x)$ is an $n$ -dimensional Gaussian random vector.

Let $y$ be defined as an affine transformation of $x$ :

$y = Ax + b$

Then $y$ is also a Gaussian random vector. Its distribution is given by:

$\begin{align*} E[y] &= A E[x] + b = A \mu_x + b \\\\ \text{cov}(y) &= A \text{cov}(x) A^T = A \Sigma_x A^T \end{align*}$

This result shows that the Gaussian family is closed under affine transformation. This property makes Gaussian models especially suitable for analysis in systems where data undergoes linear operations.

4.2 Marginal Probability of a Gaussian¶

Let $x \sim \mathcal{N}(\mu, \Sigma)$ and partition $x$ as follows:

$x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}, \quad \mu = \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix}, \quad \Sigma = \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix}$

We are interested in the marginal distribution of $x_1$ alone.

Since $x_1 = \begin{bmatrix} I & 0 \end{bmatrix} x$ , this is a linear transformation. Hence, $x_1$ is also Gaussian:

$\begin{align*} E[x_1] &= \mu_1 \\\\ \text{cov}(x_1) &= \Sigma_{11} \end{align*}$

Although not immediately obvious, the marginal distribution of a subset of jointly Gaussian variables is also Gaussian. This fact is essential in probabilistic modeling and dimensionality reduction.

4.3. Component of a Gaussian Random Vector¶

Let $x \sim \mathcal{N}(0, \Sigma)$ and let $c \in \mathbb{R}^n$ be a unit vector. Define:

$y = c^T x$

This represents the projection (or component) of $x$ along the direction $c$ .

Then $y$ is a scalar Gaussian random variable:

$\begin{align*} E[y] &= 0 \\\\ \text{var}(y) &= c^T \Sigma c \end{align*}$

Hence:

$E[y^2] = c^T \Sigma c$

This quantity measures how much variance exists in the direction $c$ .

The direction that minimizes this variance corresponds to the eigenvector of $\Sigma$ with the smallest eigenvalue:

$\min_{\lVert c \rVert=1} c^T \Sigma c = \lambda_{\text{min}}$

This observation is closely related to Principal Component Analysis (PCA), where the largest variance direction corresponds to the largest eigenvalue.

4.4. Conditional Probability of a Gaussian Random Vector¶

Suppose the joint distribution of two Gaussian vectors is given by:

$\begin{bmatrix} x \\ y \end{bmatrix} \sim \mathcal{N} \left( \begin{bmatrix} \mu_x \\ \mu_y \end{bmatrix}, \begin{bmatrix} \Sigma_x & \Sigma_{xy} \\ \Sigma_{yx} & \Sigma_y \end{bmatrix} \right)$

Then the conditional distribution of $x$ given $y$ is also Gaussian:

Conditional Distribution

$x \mid y \sim \mathcal{N} \left( \mu_x + \Sigma_{xy} \Sigma_y^{-1} (y - \mu_y), \; \Sigma_x - \Sigma_{xy} \Sigma_y^{-1} \Sigma_{yx} \right)$

Conditional Mean

The expected value of $x$ given $y$ :

$E[x \mid y] = \mu_x + \Sigma_{xy} \Sigma_y^{-1} (y - \mu_y)$

This is a linear function of $y$ , showing how our belief about $x$ changes when $y$ is observed.

Conditional Covariance

The uncertainty (covariance) about $x$ given $y$ :

$\text{cov}(x \mid y) = \Sigma_x - \Sigma_{xy} \Sigma_y^{-1} \Sigma_{yx}$

This matrix is always less than or equal to $\Sigma_x$ in the positive semi-definite sense. Intuitively, observing $y$ reduces uncertainty about $x$ , making the conditional distribution narrower.

This result is essential in Bayesian inference, Kalman filtering, and Gaussian Process regression, where observations refine our knowledge of unknown variables.

mu = np.array([0, 1])
sigma = 1./2.*np.array([[5, -3], [-3, 5]])

m = 1000000
x = np.random.multivariate_normal(mu, sigma, m)

xp = np.arange(-10, 10)

plt.figure(figsize = (4, 4))
plt.plot(x[:, 0], x[:, 1], '.', alpha = 0.1)
plt.axis([-6, 6, -6, 6])
plt.axvline(0, 0, 1, c = 'k', linestyle = '--')
plt.axhline(0, 0, 1, c = 'k', linestyle = '--')
plt.axhline(4, 0, 1, c = 'b')
plt.axhline(1, 0.45, 0.55, c = 'k')
plt.axvline(0, 0.53, 0.63, c = 'k')
plt.show()

x1 = x[:, 0]
x2 = x[:, 1]

plt.figure(figsize = (6, 2))
plt.xlim(-6, 6)
plt.axvline(0, 0, 1, c = 'k')
plt.hist(x1, bins = 51, density = True, alpha = 0.4)

plt.figure(figsize = (6, 2))
plt.xlim(-6, 6)
plt.axvline(1, 0, 1, c = 'k')
plt.hist(x2, bins = 51, density = True, alpha = 0.4)
plt.show()

z = []
for i in range(m):
    if x[i,1] > 3.8 and x[i,1] < 4.2:
        z.append(x[i,0])

plt.figure(figsize = (6, 2))
plt.hist(z, bins = 51, density = True, alpha = 0.4)
plt.axvline(0, 0, 1, c = 'k')
plt.xlim(-6, 6)
plt.show()

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')