Ellipse and Gaussian Distribution

Table of Contents





1. Coordinates

Basis and Coordinate Representations

Any vector r in a vector space can be represented with respect to different bases. For example, consider the following two bases:


basis {x^1,x^2}orbasis {y^1,y^2}



Then, the vector r can be expressed as coordinate vectors in these bases:


rX=[x1x2]: coordinates in basis {x^1,x^2}rY=[y1y2]: coordinates in basis {y^1,y^2}


Although rX and rY contain different values, they both represent the same geometric vector r.


Coordinate Transformation

The same vector r can be written in either basis:


r=x1x^1+x2x^2=y1y^1+y2y^2


We can rewrite this relationship in matrix form:


[x^1x^2][x1x2]=[y^1y^2][y1y2]


Let:


U=[x^1x^2],I=[y^1y^2](=identity)


Assuming y^1,y^2 is the standard basis, we interpret I as the identity matrix. Then the transformation becomes:


U[x1x2]=[y1y2][x1x2]=U1[y1y2]


If U is orthogonal (i.e., the basis vectors x^1,x^2 are orthonormal), then U1=UT, and we get:


[x1x2]=U1[y1y2]=UT[y1y2]


Coordinate Change Summary

The transformation between coordinate representations can be summarized as:


[x1x2]U[y1y2][y1y2]UT[x1x2]


This framework is fundamental in linear algebra and geometry, especially in understanding transformations such as rotations, projections, and changes of basis.





2. Ellipse

Why Study Ellipses?

In this chapter, our primary goal is to understand the Gaussian distribution, a fundamental building block in probability, statistics, and machine learning. However, before we formally define the Gaussian distribution, we will first study the geometry of ellipses.


Why? Because the level sets (or contours of constant density) of a multivariate Gaussian distribution correspond to points x that satisfy the following quadratic form:


(xμ)TΣ1(xμ)=c


for some constant c>0. This is precisely the equation of an ellipse. These level sets capture the geometry of uncertainty in a Gaussian model and are crucial for visualizing, interpreting, and working with Gaussian distributions.


By understanding ellipses first, we build the necessary geometric intuition to fully grasp the shape and structure of Gaussian distributions.


2.1. Equation of an Ellipse

(1) Unit Circle



The standard unit circle in R2 is defined by:


x12+x22=1[x1x2][1001][x1x2]=1


xTx=1



(2) Independent Ellipse (Axis-Aligned)



An axis-aligned ellipse centered at the origin has the form:


x12a2+x22b2=1[x1x2][1a2001b2][x1x2]=1[x1x2]Σx1[x1x2]=xTΣx1x=1whereΣx1=[1a2001b2],Σx=[a200b2]


This ellipse is aligned with the coordinate axes {x^1,x^2} and stretches along them by a and b respectively.


(3) Dependent ellipse (Rotated ellipse)

  • Coordinate changes


To express the same ellipse under a rotated coordinate system {y^1,y^2}, we use a change of coordinates via an orthogonal matrix U:


[x1x2]=UT[y1y2]or equivalentlyx=UTy,y=Ux


This transforms the coordinate system, rotating the ellipse while preserving distances and angles.


Ellipse Equation in Original Coordinates

We start from the standard ellipse equation in x-coordinates:


xTΣx1x=1


where Σx is a symmetric positive-definite matrix representing the shape and scale of the ellipse aligned with the coordinate axes.


Transforming to y-Coordinates

To express this ellipse in the rotated y-coordinate system, substitute x=UTy into the equation:


(UTy)TΣx1(UTy)=yTUΣx1UTy=1


We define the new inverse covariance in y-coordinates as:


Σy1=UΣx1UT


Thus, the ellipse equation in y-coordinates becomes:


yTΣy1y=1


Final Form of the Rotated Ellipse

Since the inverse of Σy1 is:


Σy=UΣxUT


the equation of the rotated ellipse is:


yTΣy1y=1withΣy=UΣxUT


This expresses the same ellipse, but now in a rotated (dependent) coordinate system. The axes of the ellipse are no longer aligned with the coordinate axes unless Σy is diagonal.


In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [ ]:
theta = np.arange(0, 2*np.pi, 0.01)
x1 = np.cos(theta)
x2 = np.sin(theta)

plt.figure(figsize=(4, 4))
plt.plot(x1, x2)
plt.xlim([-4, 4])
plt.ylim([-4, 4])
plt.show()
No description has been provided for this image
In [ ]:
x1 = 3*np.cos(theta);
x2 = np.sin(theta);

plt.figure(figsize = (4, 4))
plt.plot(x1, x2)
plt.xlim([-4, 4])
plt.ylim([-4, 4])
plt.show()
No description has been provided for this image

U=[x^1x^2]=[12323212]

In [ ]:
U = np.matrix([[1/2, -np.sqrt(3)/2],
              [np.sqrt(3)/2, 1/2]])

x = np.matrix([x1, x2])

y = U*x
print("U = ", U, "\n")

plt.figure(figsize = (4, 4))
plt.plot(y[0,:].T, y[1,:].T)
plt.xlim([-4, 4])
plt.ylim([-4, 4])
plt.show()
U =  [[ 0.5       -0.8660254]
 [ 0.8660254  0.5      ]] 

No description has been provided for this image
In [ ]:
Sx = np.matrix([[9, 0],
               [0, 1]])

Sy = U*Sx*U.T

print ("Sx = \n", Sx, "\n")
print ("Sy = \n", Sy)
Sx = 
 [[9 0]
 [0 1]] 

Sy = 
 [[3.         3.46410162]
 [3.46410162 7.        ]]

2.2. From Covariance Matrix to Ellipse

So far, we have examined how to determine the covariance matrix Σ from a given (possibly rotated) ellipse. Now we consider a more common and practically useful question: "Given a covariance matrix Σ, how can we visualize or reconstruct the corresponding ellipse?"


To answer this, we aim to determine the following:

  • The major axis a and minor axis b of the ellipse

  • The diagonal covariance matrix Σx

  • The rotation matrix U that orients the ellipse



The key idea: eigenvectors and eigenvalues


A=SΛSTwhere S=[υ1υ2] are eigenvectors of A,Λ=[λ100λ2]


In our case:


Σy=UΣxUT=UΛUT


where:


  • U=[x^1x^2] are eigenvectors of Σy

  • Λ=[a200b2] contains eigenvalues of Σy



Step 1: Eigenvalue Decomposition

Suppose we are given a symmetric positive definite matrix Σy.

We perform eigen-decomposition:


Σyx^1=λ1x^1Σyx^2=λ2x^2ΣyU=UΛΣy=UΛUT


  • U=[x^1x^2] is an orthogonal matrix whose columns are the eigenvectors of Σy
  • Λ=[λ100λ2] is the diagonal matrix of eigenvalues


Step 2: Identify Ellipse Parameters

  • The axes of the ellipse align with the eigenvectors in U
  • The lengths of the axes are given by the square roots of the eigenvalues

That is,


a=λ1,b=λ2


We now define:


Σx=[a200b2]=Λ


and


Σy=UΣxUT


So the original ellipse equation in y-coordinates:


yTΣy1y=1


can be interpreted in the x-coordinates (i.e., eigenbasis) as:


xTΣx1x=1


which is an axis-aligned ellipse.


Note that the transformation between coordinates:


x=UTyor[x1x2]=UT[y1y2]


In [ ]:
D, U = np.linalg.eig(Sy)

idx = np.argsort(-D)
D = D[idx]
U = U[:,idx]

print ("D = \n", np.diag(D), "\n")
print ("U = \n", U)
D = 
 [[9. 0.]
 [0. 1.]] 

U = 
 [[-0.5       -0.8660254]
 [-0.8660254  0.5      ]]

Summary



  • Independent ellipse: represented in basis x^1,x^2
  • Dependent ellipse: represented in basis y^1,y^2

Given a covariance matrix Σy, we can reconstruct the corresponding ellipse by following these steps:

  • Compute the eigenvalues λ1,;λ2 of Σy

  • Compute the corresponding eigenvectors and form the matrix U


Then the coordinate transformation is:


x=UTyU=[x^1x^2]


The lengths of the ellipse axes are given by:


a=λ1,b=λ2


Define the diagonal covariance in the independent (rotated) coordinates:


Σx=[a200b2]


Then the full covariance in the original coordinates is:


Σy=UΣxUT


This process reconstructs both the shape (through eigenvalues) and the orientation (through eigenvectors) of the ellipse from the given covariance matrix Σy.





3. Gaussian Distribution

Gaussian distributions are foundational in probability, statistics, and machine learning due to their mathematical tractability and their role in modeling natural phenomena. In this section, we begin with the univariate Gaussian and build toward the multivariate case.


3.1. Standard Univariate Normal Distribution

The most basic form of a Gaussian distribution is the standard normal distribution, defined over a single variable y. It is symmetric around zero and has unit variance:



PY(Y=y)=12πexp(12y2)N(0,12)


This distribution is bell-shaped, centered at zero, and is widely used for normalization and standardization of data.


12y2=constprob. contour


The exponent 12y2=const defines level sets (or contours) of equal probability density. In one dimension, these are simply intervals on the real line, but in higher dimensions, they generalize to ellipses.


In [ ]:
y = np.arange(-10, 10, 0.1)

ProbG = 1/np.sqrt(2*np.pi)*np.exp(-1/2*y**2)

plt.figure(figsize = (6, 4))
plt.plot(y, ProbG)
plt.ylim([0, 1])
plt.show()
No description has been provided for this image
In [ ]:
from scipy.stats import norm

ProbG2 = norm.pdf(y)

plt.figure(figsize = (6, 4))
plt.plot(y, ProbG2)
plt.ylim([0, 1])
plt.show()
No description has been provided for this image
In [ ]:
x = np.random.randn(5000,1)

plt.figure(figsize = (6, 4))
plt.hist(x, bins = 51, density = True, alpha = 0.4)
plt.plot(y, ProbG2, label = 'G2')
plt.show()
No description has been provided for this image

3.2. Univariate Normal Distribution

The general form of a univariate Gaussian includes parameters for mean and variance:

  • μ: the mean, which determines the center of the distribution
  • σ2: the variance, which controls the spread or width of the distribution

The probability density function (pdf) is given by:


N(x;μ,σ)=12πσexp(12(xμ)2σ2)


This expression shows that the density decays exponentially as x moves away from the mean μ. The decay rate is controlled by the variance σ2.


Standardization: Any Gaussian random variable xN(μ,σ2) can be converted into a standard normal variable using:


xN(μ,σ2)PY(y)=PX(x),y=xμσ,or equivalentlyx=σy+μPX(X=x)exp(12(xμσ)2)=exp(12(xμ)2σ2)


The transformation maps x into a standard normal variable yN(0,1).


Properties:


E[x]=μvar(x)=σ2


These identities follow from integrating the pdf and are essential in statistical estimation.


In [ ]:
mu = 2
sigma = 3

x = np.arange(-10, 10, 0.1)

ProbG3 = 1/(np.sqrt(2*np.pi)*sigma) * np.exp(-1/2*(x-mu)**2/(sigma**2))

plt.figure(figsize = (6, 4))
plt.plot(y, ProbG, label = 'G1')
plt.plot(x, ProbG3, label = 'G3')
plt.legend()
plt.show()
No description has been provided for this image
In [ ]:
ProbG2 = norm.pdf(x, 2, 3)

plt.figure(figsize = (6, 4))
plt.plot(y, ProbG, label = 'G1')
plt.plot(x, ProbG2, label = 'G2')
plt.legend()
plt.show()
No description has been provided for this image
In [ ]:
x = mu + sigma*np.random.randn(5000,1)

plt.figure(figsize = (6, 4))
plt.hist(x, bins = 51, density = True, alpha = 0.4)
plt.plot(y, ProbG2, label = 'G2')
# plt.ylim([0, 0.4])
plt.show()
No description has been provided for this image

3.3. Multivariate Gaussian Models

In higher dimensions, the Gaussian distribution generalizes to random vectors. Let xRn. Then, the multivariate normal distribution is defined as:


N(x;μ,Σ)=1(2π)n/2|Σ|1/2exp(12(xμ)TΣ1(xμ))


  • μRn: the mean vector
  • ΣRn×n: the covariance matrix, symmetric and positive definite
  • |Σ|: the determinant of the covariance matrix
  • Σ1: the precision matrix, inversely related to spread

The exponent defines an ellipsoidal contour:


Δ2=(xμ)TΣ1(xμ)


This quadratic form is constant along the boundary of probability density contours. These contours are always centered at μ and shaped according to Σ.



3.3.1. Two Independent Variables

Let X1 and X2 be independent Gaussian random variables with their own means and variances. Then their joint distribution is:


P(X1=x1,X2=x2)=PX1(x1)PX2(x2)exp(12[(x1μ1)2σ12+(x2μ2)2σ22])


In matrix form:


x=[x1x2],μ=[μ1μ2],Σ=[σ1200σ22]


Then:


P(x)exp(12(xμ)TΣ1(xμ))


Ellipse Interpretation:

The density contour of this bivariate Gaussian becomes:


x12σx12+x22σx22=c(ellipse)[x1x2][1σx12001σx22][x1x2]=c(σx1<σx2)


This is the equation of an axis-aligned ellipse. Its shape is determined by the variances and its axes are aligned with the coordinate system.


Summary in a matrix form


N(0,Σx)exp(12xTΣx1x)N(μx,Σx)exp(12(xμx)TΣx1(xμx))


In [ ]:
mu = np.array([0, 0])
sigma = np.eye(2)

m = 1000
x = np.random.multivariate_normal(mu, sigma, m)

plt.figure(figsize = (6, 4))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8, 8])
plt.show()
No description has been provided for this image
In [ ]:
mu = np.array([0, 0])
sigma = np.array([[1, 0], [0, 4]])

m = 1000
x = np.random.multivariate_normal(mu, sigma, m)

plt.figure(figsize = (6, 4))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8, 8])
plt.show()
No description has been provided for this image

3.3.2. Two Dependent Variables



When the variables are not independent, the covariance matrix Σ has off-diagonal terms, and the joint distribution involves a rotated ellipse.

Suppose:


x=UTyandy=Ux


Then:


xTΣx1x=yTUΣx1UTy=yTΣy1y


So the transformed covariance is:


Σy=UΣxUT


Affine Transform of a Gaussian:


xN(μx,Σx),y=Ax+b


Then:


yN(Aμx+b,AΣxAT)


The result is still Gaussian. This property makes Gaussians extremely useful in probabilistic modeling, as linear operations preserve normality.


Summary:

xN(μx,Σx) and y=Ax+baffine transformationyN(μy,Σy)=N(Aμx+b,AΣxAT)yis also Gaussian withμy=Aμx+b,Σy=AΣxAT


In [ ]:
mu = np.array([0, 0])
sigma = 1./2.*np.array([[5, -3], [-3, 5]])

m = 1000
x = np.random.multivariate_normal(mu, sigma, m)

plt.figure(figsize = (6, 4))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8,8])
plt.show()
No description has been provided for this image

3.4. Decouple using Covariance Matrix

Given a dataset assumed to follow a Gaussian distribution, we estimate the covariance matrix from data and analyze its eigenstructure.


Let y be a centered random vector (i.e., E[y]=0). Then the sample covariance matrix is:


Σy=[var(y1)cov(y1,y2)cov(y2,y1)var(y2)]


We now perform eigen-analysis:


Σyx^1=λ1x^1,Σyx^2=λ2x^2


These eigenvectors give the directions of the major and minor axes of the data distribution, and the eigenvalues determine the length of those axes.


If:


U=[x^1x^2]


Then:


Σy=UΣxUT


where Σx is diagonal:


Σx=[λ100λ2]


This diagonalization decouples the Gaussian and transforms it into an axis-aligned ellipse in a new coordinate system.



eigen-analysisΣyx^1=λ1x^1Σyx^2=λ2x^2Σx1=[1λ12001λ22]Σx=[λ1200λ22]



Σy[x^1x^2]=[x^1x^2][λ100λ2]=[x^1x^2]ΣxΣy=UΣxUTy=UxUTy=x[x^1x^2]=U


In [ ]:
S = np.cov(x.T)
print ("S = \n", S)
S = 
 [[ 2.4683757  -1.30004182]
 [-1.30004182  2.31394541]]
In [ ]:
D, U = np.linalg.eig(S)

idx = np.argsort(-D)
D = D[idx]
U = U[:,idx]

print ("U = \n", U, '\n')
print ("D = \n", D)
U = 
 [[ 0.72776709  0.68582437]
 [-0.68582437  0.72776709]] 

D = 
 [3.69349342 1.08882769]
In [ ]:
xp = np.arange(-10, 10)

plt.figure(figsize = (6, 4))
plt.plot(x[:,0], x[:,1], '.')
plt.plot(xp, U[1,0]/U[0,0]*xp, label = 'u1')
plt.plot(xp, U[1,1]/U[0,1]*xp, label = 'u2')
plt.axis('equal')
plt.ylim([-8, 8])
plt.legend()
plt.show()
No description has been provided for this image




4. Properties of Gaussian Distribution


Gaussian distributions are not only widely applicable but also mathematically elegant. One of their most powerful features is their closedness under various operations: linear transformations, marginalization, and conditioning. This means that applying these operations to a Gaussian random variable results in another Gaussian.


Key properties:

  • Symmetric about the mean
  • Fully specified by mean and covariance
  • Uncorrelated Gaussian variables are independent

Closed under:

  • Linear and affine transformations
  • Dimension reduction: marginalization and conditioning
  • These properties are crucial in statistical inference and Bayesian analysis


4.1. Affine Transformation

Suppose xN(μx,Σx) is an n-dimensional Gaussian random vector.


Let y be defined as an affine transformation of x:


y=Ax+b


Then y is also a Gaussian random vector. Its distribution is given by:


E[y]=AE[x]+b=Aμx+bcov(y)=Acov(x)AT=AΣxAT


This result shows that the Gaussian family is closed under affine transformation. This property makes Gaussian models especially suitable for analysis in systems where data undergoes linear operations.



4.2 Marginal Probability of a Gaussian

Let xN(μ,Σ) and partition x as follows:


x=[x1x2],μ=[μ1μ2],Σ=[Σ11Σ12Σ21Σ22]


We are interested in the marginal distribution of x1 alone.


Since x1=[I0]x, this is a linear transformation. Hence, x1 is also Gaussian:


E[x1]=μ1cov(x1)=Σ11


Although not immediately obvious, the marginal distribution of a subset of jointly Gaussian variables is also Gaussian. This fact is essential in probabilistic modeling and dimensionality reduction.



4.3. Component of a Gaussian Random Vector

Let xN(0,Σ) and let cRn be a unit vector. Define:


y=cTx


This represents the projection (or component) of x along the direction c.


Then y is a scalar Gaussian random variable:


E[y]=0var(y)=cTΣc


Hence:


E[y2]=cTΣc


This quantity measures how much variance exists in the direction c.


The direction that minimizes this variance corresponds to the eigenvector of Σ with the smallest eigenvalue:


minc=1cTΣc=λmin


This observation is closely related to Principal Component Analysis (PCA), where the largest variance direction corresponds to the largest eigenvalue.



4.4. Conditional Probability of a Gaussian Random Vector

Suppose the joint distribution of two Gaussian vectors is given by:


[xy]N([μxμy],[ΣxΣxyΣyxΣy])


Then the conditional distribution of x given y is also Gaussian:


Conditional Distribution


xyN(μx+ΣxyΣy1(yμy),ΣxΣxyΣy1Σyx)


Conditional Mean

The expected value of x given y:


E[xy]=μx+ΣxyΣy1(yμy)


This is a linear function of y, showing how our belief about x changes when y is observed.


Conditional Covariance

The uncertainty (covariance) about x given y:


cov(xy)=ΣxΣxyΣy1Σyx


This matrix is always less than or equal to Σx in the positive semi-definite sense. Intuitively, observing y reduces uncertainty about x, making the conditional distribution narrower.


This result is essential in Bayesian inference, Kalman filtering, and Gaussian Process regression, where observations refine our knowledge of unknown variables.


In [ ]:
mu = np.array([0, 1])
sigma = 1./2.*np.array([[5, -3], [-3, 5]])

m = 1000000
x = np.random.multivariate_normal(mu, sigma, m)
In [ ]:
xp = np.arange(-10, 10)

plt.figure(figsize = (4, 4))
plt.plot(x[:, 0], x[:, 1], '.', alpha = 0.1)
plt.axis([-6, 6, -6, 6])
plt.axvline(0, 0, 1, c = 'k', linestyle = '--')
plt.axhline(0, 0, 1, c = 'k', linestyle = '--')
plt.axhline(4, 0, 1, c = 'b')
plt.axhline(1, 0.45, 0.55, c = 'k')
plt.axvline(0, 0.53, 0.63, c = 'k')
plt.show()
No description has been provided for this image
In [ ]:
x1 = x[:, 0]
x2 = x[:, 1]

plt.figure(figsize = (6, 2))
plt.xlim(-6, 6)
plt.axvline(0, 0, 1, c = 'k')
plt.hist(x1, bins = 51, density = True, alpha = 0.4)

plt.figure(figsize = (6, 2))
plt.xlim(-6, 6)
plt.axvline(1, 0, 1, c = 'k')
plt.hist(x2, bins = 51, density = True, alpha = 0.4)
plt.show()
No description has been provided for this image
No description has been provided for this image
In [ ]:
z = []
for i in range(m):
    if x[i,1] > 3.8 and x[i,1] < 4.2:
        z.append(x[i,0])
In [ ]:
plt.figure(figsize = (6, 2))
plt.hist(z, bins = 51, density = True, alpha = 0.4)
plt.axvline(0, 0, 1, c = 'k')
plt.xlim(-6, 6)
plt.show()
No description has been provided for this image
In [1]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')