Ellipse and Gaussian Distribution
Table of Contents
1. Coordinates¶
Basis and Coordinate Representations
Any vector in a vector space can be represented with respect to different bases. For example, consider the following two bases:
Then, the vector can be expressed as coordinate vectors in these bases:
Although and contain different values, they both represent the same geometric vector .
Coordinate Transformation
The same vector can be written in either basis:
We can rewrite this relationship in matrix form:
Let:
Assuming is the standard basis, we interpret as the identity matrix. Then the transformation becomes:
If is orthogonal (i.e., the basis vectors are orthonormal), then , and we get:
Coordinate Change Summary
The transformation between coordinate representations can be summarized as:
This framework is fundamental in linear algebra and geometry, especially in understanding transformations such as rotations, projections, and changes of basis.
2. Ellipse¶
Why Study Ellipses?
In this chapter, our primary goal is to understand the Gaussian distribution, a fundamental building block in probability, statistics, and machine learning. However, before we formally define the Gaussian distribution, we will first study the geometry of ellipses.
Why? Because the level sets (or contours of constant density) of a multivariate Gaussian distribution correspond to points that satisfy the following quadratic form:
for some constant . This is precisely the equation of an ellipse. These level sets capture the geometry of uncertainty in a Gaussian model and are crucial for visualizing, interpreting, and working with Gaussian distributions.
By understanding ellipses first, we build the necessary geometric intuition to fully grasp the shape and structure of Gaussian distributions.
2.1. Equation of an Ellipse¶
(1) Unit Circle
The standard unit circle in is defined by:
(2) Independent Ellipse (Axis-Aligned)
An axis-aligned ellipse centered at the origin has the form:
This ellipse is aligned with the coordinate axes and stretches along them by and respectively.
(3) Dependent ellipse (Rotated ellipse)
- Coordinate changes
To express the same ellipse under a rotated coordinate system , we use a change of coordinates via an orthogonal matrix :
This transforms the coordinate system, rotating the ellipse while preserving distances and angles.
Ellipse Equation in Original Coordinates
We start from the standard ellipse equation in -coordinates:
where is a symmetric positive-definite matrix representing the shape and scale of the ellipse aligned with the coordinate axes.
Transforming to -Coordinates
To express this ellipse in the rotated -coordinate system, substitute into the equation:
We define the new inverse covariance in -coordinates as:
Thus, the ellipse equation in -coordinates becomes:
Final Form of the Rotated Ellipse
Since the inverse of is:
the equation of the rotated ellipse is:
This expresses the same ellipse, but now in a rotated (dependent) coordinate system. The axes of the ellipse are no longer aligned with the coordinate axes unless is diagonal.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
theta = np.arange(0, 2*np.pi, 0.01)
x1 = np.cos(theta)
x2 = np.sin(theta)
plt.figure(figsize=(4, 4))
plt.plot(x1, x2)
plt.xlim([-4, 4])
plt.ylim([-4, 4])
plt.show()
x1 = 3*np.cos(theta);
x2 = np.sin(theta);
plt.figure(figsize = (4, 4))
plt.plot(x1, x2)
plt.xlim([-4, 4])
plt.ylim([-4, 4])
plt.show()
U = np.matrix([[1/2, -np.sqrt(3)/2],
[np.sqrt(3)/2, 1/2]])
x = np.matrix([x1, x2])
y = U*x
print("U = ", U, "\n")
plt.figure(figsize = (4, 4))
plt.plot(y[0,:].T, y[1,:].T)
plt.xlim([-4, 4])
plt.ylim([-4, 4])
plt.show()
Sx = np.matrix([[9, 0],
[0, 1]])
Sy = U*Sx*U.T
print ("Sx = \n", Sx, "\n")
print ("Sy = \n", Sy)
2.2. From Covariance Matrix to Ellipse¶
So far, we have examined how to determine the covariance matrix from a given (possibly rotated) ellipse. Now we consider a more common and practically useful question: "Given a covariance matrix , how can we visualize or reconstruct the corresponding ellipse?"
To answer this, we aim to determine the following:
The major axis and minor axis of the ellipse
The diagonal covariance matrix
The rotation matrix that orients the ellipse
The key idea: eigenvectors and eigenvalues
In our case:
where:
are eigenvectors of
contains eigenvalues of
Step 1: Eigenvalue Decomposition
Suppose we are given a symmetric positive definite matrix .
We perform eigen-decomposition:
- is an orthogonal matrix whose columns are the eigenvectors of
- is the diagonal matrix of eigenvalues
Step 2: Identify Ellipse Parameters
- The axes of the ellipse align with the eigenvectors in
- The lengths of the axes are given by the square roots of the eigenvalues
That is,
We now define:
and
So the original ellipse equation in -coordinates:
can be interpreted in the -coordinates (i.e., eigenbasis) as:
which is an axis-aligned ellipse.
Note that the transformation between coordinates:
D, U = np.linalg.eig(Sy)
idx = np.argsort(-D)
D = D[idx]
U = U[:,idx]
print ("D = \n", np.diag(D), "\n")
print ("U = \n", U)
Summary
- Independent ellipse: represented in basis
- Dependent ellipse: represented in basis
Given a covariance matrix , we can reconstruct the corresponding ellipse by following these steps:
Compute the eigenvalues of
Compute the corresponding eigenvectors and form the matrix
Then the coordinate transformation is:
The lengths of the ellipse axes are given by:
Define the diagonal covariance in the independent (rotated) coordinates:
Then the full covariance in the original coordinates is:
This process reconstructs both the shape (through eigenvalues) and the orientation (through eigenvectors) of the ellipse from the given covariance matrix .
3. Gaussian Distribution¶
Gaussian distributions are foundational in probability, statistics, and machine learning due to their mathematical tractability and their role in modeling natural phenomena. In this section, we begin with the univariate Gaussian and build toward the multivariate case.
3.1. Standard Univariate Normal Distribution¶
The most basic form of a Gaussian distribution is the standard normal distribution, defined over a single variable . It is symmetric around zero and has unit variance:
This distribution is bell-shaped, centered at zero, and is widely used for normalization and standardization of data.
The exponent defines level sets (or contours) of equal probability density. In one dimension, these are simply intervals on the real line, but in higher dimensions, they generalize to ellipses.
y = np.arange(-10, 10, 0.1)
ProbG = 1/np.sqrt(2*np.pi)*np.exp(-1/2*y**2)
plt.figure(figsize = (6, 4))
plt.plot(y, ProbG)
plt.ylim([0, 1])
plt.show()
from scipy.stats import norm
ProbG2 = norm.pdf(y)
plt.figure(figsize = (6, 4))
plt.plot(y, ProbG2)
plt.ylim([0, 1])
plt.show()
x = np.random.randn(5000,1)
plt.figure(figsize = (6, 4))
plt.hist(x, bins = 51, density = True, alpha = 0.4)
plt.plot(y, ProbG2, label = 'G2')
plt.show()
3.2. Univariate Normal Distribution¶
The general form of a univariate Gaussian includes parameters for mean and variance:
- : the mean, which determines the center of the distribution
- : the variance, which controls the spread or width of the distribution
The probability density function (pdf) is given by:
This expression shows that the density decays exponentially as moves away from the mean . The decay rate is controlled by the variance .
Standardization: Any Gaussian random variable can be converted into a standard normal variable using:
The transformation maps into a standard normal variable .
Properties:
These identities follow from integrating the pdf and are essential in statistical estimation.
mu = 2
sigma = 3
x = np.arange(-10, 10, 0.1)
ProbG3 = 1/(np.sqrt(2*np.pi)*sigma) * np.exp(-1/2*(x-mu)**2/(sigma**2))
plt.figure(figsize = (6, 4))
plt.plot(y, ProbG, label = 'G1')
plt.plot(x, ProbG3, label = 'G3')
plt.legend()
plt.show()
ProbG2 = norm.pdf(x, 2, 3)
plt.figure(figsize = (6, 4))
plt.plot(y, ProbG, label = 'G1')
plt.plot(x, ProbG2, label = 'G2')
plt.legend()
plt.show()
x = mu + sigma*np.random.randn(5000,1)
plt.figure(figsize = (6, 4))
plt.hist(x, bins = 51, density = True, alpha = 0.4)
plt.plot(y, ProbG2, label = 'G2')
# plt.ylim([0, 0.4])
plt.show()
3.3. Multivariate Gaussian Models¶
In higher dimensions, the Gaussian distribution generalizes to random vectors. Let . Then, the multivariate normal distribution is defined as:
- : the mean vector
- : the covariance matrix, symmetric and positive definite
- : the determinant of the covariance matrix
- : the precision matrix, inversely related to spread
The exponent defines an ellipsoidal contour:
This quadratic form is constant along the boundary of probability density contours. These contours are always centered at and shaped according to .
3.3.1. Two Independent Variables¶
Let and be independent Gaussian random variables with their own means and variances. Then their joint distribution is:
In matrix form:
Then:
Ellipse Interpretation:
The density contour of this bivariate Gaussian becomes:
This is the equation of an axis-aligned ellipse. Its shape is determined by the variances and its axes are aligned with the coordinate system.
Summary in a matrix form
mu = np.array([0, 0])
sigma = np.eye(2)
m = 1000
x = np.random.multivariate_normal(mu, sigma, m)
plt.figure(figsize = (6, 4))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8, 8])
plt.show()
mu = np.array([0, 0])
sigma = np.array([[1, 0], [0, 4]])
m = 1000
x = np.random.multivariate_normal(mu, sigma, m)
plt.figure(figsize = (6, 4))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8, 8])
plt.show()
When the variables are not independent, the covariance matrix has off-diagonal terms, and the joint distribution involves a rotated ellipse.
Suppose:
Then:
So the transformed covariance is:
Affine Transform of a Gaussian:
Then:
The result is still Gaussian. This property makes Gaussians extremely useful in probabilistic modeling, as linear operations preserve normality.
Summary:
mu = np.array([0, 0])
sigma = 1./2.*np.array([[5, -3], [-3, 5]])
m = 1000
x = np.random.multivariate_normal(mu, sigma, m)
plt.figure(figsize = (6, 4))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8,8])
plt.show()
3.4. Decouple using Covariance Matrix¶
Given a dataset assumed to follow a Gaussian distribution, we estimate the covariance matrix from data and analyze its eigenstructure.
Let be a centered random vector (i.e., ). Then the sample covariance matrix is:
We now perform eigen-analysis:
These eigenvectors give the directions of the major and minor axes of the data distribution, and the eigenvalues determine the length of those axes.
If:
Then:
where is diagonal:
This diagonalization decouples the Gaussian and transforms it into an axis-aligned ellipse in a new coordinate system.
S = np.cov(x.T)
print ("S = \n", S)
D, U = np.linalg.eig(S)
idx = np.argsort(-D)
D = D[idx]
U = U[:,idx]
print ("U = \n", U, '\n')
print ("D = \n", D)
xp = np.arange(-10, 10)
plt.figure(figsize = (6, 4))
plt.plot(x[:,0], x[:,1], '.')
plt.plot(xp, U[1,0]/U[0,0]*xp, label = 'u1')
plt.plot(xp, U[1,1]/U[0,1]*xp, label = 'u2')
plt.axis('equal')
plt.ylim([-8, 8])
plt.legend()
plt.show()
4. Properties of Gaussian Distribution¶
Gaussian distributions are not only widely applicable but also mathematically elegant. One of their most powerful features is their closedness under various operations: linear transformations, marginalization, and conditioning. This means that applying these operations to a Gaussian random variable results in another Gaussian.
Key properties:
- Symmetric about the mean
- Fully specified by mean and covariance
- Uncorrelated Gaussian variables are independent
Closed under:
- Linear and affine transformations
- Dimension reduction: marginalization and conditioning
- These properties are crucial in statistical inference and Bayesian analysis
4.1. Affine Transformation¶
Suppose is an -dimensional Gaussian random vector.
Let be defined as an affine transformation of :
Then is also a Gaussian random vector. Its distribution is given by:
This result shows that the Gaussian family is closed under affine transformation. This property makes Gaussian models especially suitable for analysis in systems where data undergoes linear operations.
4.2 Marginal Probability of a Gaussian¶
Let and partition as follows:
We are interested in the marginal distribution of alone.
Since , this is a linear transformation. Hence, is also Gaussian:
Although not immediately obvious, the marginal distribution of a subset of jointly Gaussian variables is also Gaussian. This fact is essential in probabilistic modeling and dimensionality reduction.
4.3. Component of a Gaussian Random Vector¶
Let and let be a unit vector. Define:
This represents the projection (or component) of along the direction .
Then is a scalar Gaussian random variable:
Hence:
This quantity measures how much variance exists in the direction .
The direction that minimizes this variance corresponds to the eigenvector of with the smallest eigenvalue:
This observation is closely related to Principal Component Analysis (PCA), where the largest variance direction corresponds to the largest eigenvalue.
4.4. Conditional Probability of a Gaussian Random Vector¶
Suppose the joint distribution of two Gaussian vectors is given by:
Then the conditional distribution of given is also Gaussian:
Conditional Distribution
Conditional Mean
The expected value of given :
This is a linear function of , showing how our belief about changes when is observed.
Conditional Covariance
The uncertainty (covariance) about given :
This matrix is always less than or equal to in the positive semi-definite sense. Intuitively, observing reduces uncertainty about , making the conditional distribution narrower.
This result is essential in Bayesian inference, Kalman filtering, and Gaussian Process regression, where observations refine our knowledge of unknown variables.
mu = np.array([0, 1])
sigma = 1./2.*np.array([[5, -3], [-3, 5]])
m = 1000000
x = np.random.multivariate_normal(mu, sigma, m)
xp = np.arange(-10, 10)
plt.figure(figsize = (4, 4))
plt.plot(x[:, 0], x[:, 1], '.', alpha = 0.1)
plt.axis([-6, 6, -6, 6])
plt.axvline(0, 0, 1, c = 'k', linestyle = '--')
plt.axhline(0, 0, 1, c = 'k', linestyle = '--')
plt.axhline(4, 0, 1, c = 'b')
plt.axhline(1, 0.45, 0.55, c = 'k')
plt.axvline(0, 0.53, 0.63, c = 'k')
plt.show()
x1 = x[:, 0]
x2 = x[:, 1]
plt.figure(figsize = (6, 2))
plt.xlim(-6, 6)
plt.axvline(0, 0, 1, c = 'k')
plt.hist(x1, bins = 51, density = True, alpha = 0.4)
plt.figure(figsize = (6, 2))
plt.xlim(-6, 6)
plt.axvline(1, 0, 1, c = 'k')
plt.hist(x2, bins = 51, density = True, alpha = 0.4)
plt.show()
z = []
for i in range(m):
if x[i,1] > 3.8 and x[i,1] < 4.2:
z.append(x[i,0])
plt.figure(figsize = (6, 2))
plt.hist(z, bins = 51, density = True, alpha = 0.4)
plt.axvline(0, 0, 1, c = 'k')
plt.xlim(-6, 6)
plt.show()
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')