Ellipse and Gaussian Distribution


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Coordinates

  • Coordinates with basis
    • basis $\{\hat x_1 \;\hat x_2 \}$ or basis $\{\hat y_1 \;\hat y_2 \}$




$$ \begin{align*} \vec{r}_X &= \begin{bmatrix}x_1\\x_2\end{bmatrix} \text{: coordinate of} \; \vec r \text{ in basis} \; \{\hat x_1 \;\hat x_2 \} \\ \vec{r}_Y &= \begin{bmatrix}y_1\\y_2\end{bmatrix} \text{: coordinate of} \;\vec r \text{ in basis} \;\{\hat y_1 \;\hat y_2 \} \end{align*} $$


  • Coordinate transformation
$$\begin{align*} \vec r = x_1\hat{x}_1 + x_2\hat{x}_2 & = y_1\hat{y}_1 + y_2\hat{y}_2 \\\\ \begin{bmatrix}\hat{x}_1&\hat{x}_2\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix} &=\begin{bmatrix}\hat{y}_1&\hat{y}_2\end{bmatrix}\begin{bmatrix}y_1\\y_2\end{bmatrix} \\ \\ U\begin{bmatrix}x_1\\x_2\end{bmatrix} &= \begin{bmatrix}y_1\\y_2\end{bmatrix} \quad \left( U = \begin{bmatrix}\hat{x}_1 & \hat{x}_2\end{bmatrix}, I = \begin{bmatrix}\hat{y}_1 & \hat{y}_2\end{bmatrix} \right)\\\\ \begin{bmatrix}x_1\\x_2\end{bmatrix} &= U^{-1}\begin{bmatrix}y_1\\y_2\end{bmatrix}= U^{T}\begin{bmatrix}y_1\\y_2\end{bmatrix} \end{align*}$$
  • Coordinate change


$$ \begin{align*} \begin{bmatrix}x_1\\x_2\end{bmatrix} \quad &\underrightarrow{U} \quad \begin{bmatrix}y_1\\y_2\end{bmatrix} \\\\ \begin{bmatrix}y_1\\y_2\end{bmatrix} \quad &\underrightarrow{U^T} \quad \begin{bmatrix}x_1\\x_2\end{bmatrix} \end{align*} $$

2. Equation of an Ellipse

  • Unit circle




$$ \begin{align*} & x_1^2 + x_2^2 = 1 \implies \\ \\ &\begin{bmatrix}x_1 & x_2 \end{bmatrix} \begin{bmatrix}1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = 1 \end{align*} $$
          
  • Independent ellipse




$$ \begin{align*} \frac{x_1^2}{a^2} + \frac{x_2^2}{b^2} = 1 &\implies \begin{bmatrix}x_1 & x_2 \end{bmatrix} \begin{bmatrix}\frac{1}{a^2} & 0 \\ 0 & \frac{1}{b^2} \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = 1 \\ & \implies \begin{bmatrix}x_1 & x_2 \end{bmatrix} \Sigma_x^{-1} \begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = 1\\ \\ &\text{where} \; \Sigma_x^{-1} =\begin{bmatrix}\frac{1}{a^2} & 0\\ 0 & \frac{1}{b^2} \end{bmatrix}, \,\Sigma_x = \begin{bmatrix}a^2 & 0\\ 0 & b^2 \end{bmatrix} \end{align*} $$
  • Dependent ellipse (Rotated ellipse)
    • Coordinate changes



$$\begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = U^T \begin{bmatrix}y_1 \\ y_2 \end{bmatrix},\quad \begin{array}{Icr} \begin{align} &x= U^Ty \\ &Ux=y \end{align} \end{array} $$
  • Now we know in basis $\{\hat x_1 \;\hat x_2 \} $


$$ x^T \Sigma_x^{-1}x = 1 \quad \text{and} \quad \Sigma_x = \begin{bmatrix}a^2 & 0\\ 0 & b^2 \end{bmatrix} $$

  • Then, we can find $\Sigma_y$ such that


$$ \begin{align*} y^T \Sigma_y^{-1}y &= 1 \quad \text{and} \quad \Sigma_y = ?\\\\ \implies x^T \Sigma_x^{-1} x &= y^T U\Sigma_x^{-1} U^Ty = 1 \quad (\Sigma_y^{-1} \; \text{: similar matrix to } \Sigma_x^{-1}) \\\\ \therefore \;\Sigma_y^{-1} &= U\Sigma_x^{-1}U^T \quad \text{or}\\\\ \Sigma_y &= U\Sigma_x U^T \end{align*}$$


  • The quation of dependent ellipse


$$y^T \Sigma_y^{-1}y = 1 \quad \text{where} \quad \Sigma_y = U\Sigma_x U^T$$
In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
theta = np.arange(0,2*np.pi,0.01)
x1 = np.cos(theta)
x2 = np.sin(theta)

plt.figure(figsize=(6,6))
plt.plot(x1, x2)
plt.xlim([-4,4])
plt.ylim([-4,4])
plt.show()
In [3]:
x1 = 3*np.cos(theta);
x2 = np.sin(theta);

plt.figure(figsize=(6,6))
plt.plot(x1, x2)
plt.xlim([-4,4])
plt.ylim([-4,4])
plt.show()
$$ U = \left[\hat x_1 \;\hat x_2 \right] = \begin{bmatrix}\frac{1}{2} & -\frac{\sqrt{3}}{2}\\ \frac{\sqrt{3}}{2} & \frac{1}{2} \end{bmatrix}$$
In [4]:
U = np.matrix([[1/2, -np.sqrt(3)/2], 
              [np.sqrt(3)/2, 1/2]])

x = np.matrix([x1, x2])

y = U*x
print("U = \n", U)

plt.figure(figsize=(6,6))
plt.plot(y[0,:].T, y[1,:].T)
plt.xlim([-4,4])
plt.ylim([-4,4])
plt.show()
U = 
 [[ 0.5       -0.8660254]
 [ 0.8660254  0.5      ]]
In [5]:
Sx = np.matrix([[9, 0],
               [0, 1]])

Sy = U*Sx*U.T

print ("Sx = \n", Sx, "\n")
print ("Sy = \n", Sy)
Sx = 
 [[9 0]
 [0 1]] 

Sy = 
 [[3.         3.46410162]
 [3.46410162 7.        ]]

2.1. Question (Reverse Problem)

  • Given $\Sigma_y^{-1}$ (or $\Sigma_y)$,
    • how to find $a$ (major axis) and $b$ (minor axis)
    • how to find the $\Sigma_x$
    • how to find the proper matrix $U$
  • eigenvectors of $\Sigma$


$$A=S \Lambda S^T \qquad \text{where } S=[\upsilon_1 \; \upsilon_2] \text{ eigenvector of } A, \; \text{and }\Lambda=\begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}$$


$$ \text{here, }\, \Sigma_y = U\Sigma_x U^T = U \Lambda U^T \qquad \text{where } U = \begin{bmatrix}\hat{x}_1 & \hat{x}_2 \end{bmatrix} \text{ eigenvector of } \Sigma_y, \; \text{and }\Lambda=\begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix} = \begin{bmatrix} a^2 & 0 \\ 0 & b^2 \end{bmatrix} $$




$$\begin{align*} \text{eigen-analysis} \begin{cases}\Sigma_y\hat{x}_1=\lambda_1 \hat{x}_1\\ \Sigma_y\hat{x}_2=\lambda_2 \hat{x}_2\\ \end{cases} \; \implies \; & \Sigma_y \underbrace{\begin{bmatrix}\hat{x}_1 & \hat{x}_2 \end{bmatrix}}_{U} = \underbrace{\begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix}}_{U} \underbrace{\begin{bmatrix}\lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}}_{\Lambda} \\ \\ & \Sigma_y U = U\Lambda \\ & \Sigma_y = U \Lambda U^T = U \Sigma_x U^T \end{align*}$$



$$\begin{array}{Icr} \\x=U^Ty \\\\ \begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = U^T \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \end{array} \quad \quad \begin{array}{Icr} \begin{align*} & a=\sqrt{\lambda_1}\\ & b = \sqrt{\lambda_2}\\\\ & \text{major axis}=\hat{x}_1\\ & \text{minor axis}=\hat{x}_2 \end{align*} \end{array}$$


  • Again, it is an eigenvalue and eigenvector problem.
In [6]:
D, U = np.linalg.eig(Sy)

idx = np.argsort(-D)
D = D[idx]
U = U[:,idx]

print ("D = \n", np.diag(D))
print ("U = \n", U)
D = 
 [[9. 0.]
 [0. 1.]]
U = 
 [[-0.5       -0.8660254]
 [-0.8660254  0.5      ]]

2.2. Summary



$$ \begin{align*} x &=U^Ty \\\\ U &=\begin{bmatrix}\hat{x}_1 & \hat{x}_2\end{bmatrix} \end{align*}$$
  • Independent ellipse in $\{\hat{x}_1, \hat{x}_2 \}$

  • Dependent ellipse in $\{ \hat{y}_1, \hat{y}_2 \}$

  • Decouple

    • Diagonalize
    • Eigen-analysis

3. Gaussian Distribution

3.1. Standard Univariate Normal Distribution $\sim \mathcal{N}(0,1^2)$



$$ P_Y\left(Y=y\right)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{1}{2}y^2\right)$$


$$ \frac{1}{2} y^2 = \text{const} \implies \text{prob. contour}$$

In [7]:
y = np.arange(-10,10,0.1)

ProbG = 1/np.sqrt(2*np.pi)*np.exp(-1/2*y**2)

plt.figure(figsize=(10,6))
plt.plot(y, ProbG)
plt.show()
In [8]:
from scipy.stats import norm

ProbG2 = norm.pdf(y)

plt.figure(figsize=(10,6))
plt.plot(y, ProbG2)
plt.show()
In [9]:
x = np.random.randn(5000,1)

plt.figure(figsize=(10,6))
plt.hist(x, bins=51, normed=True)
plt.plot(y, ProbG2, label='G2')
plt.show()
c:\users\seungchul\appdata\local\programs\python\python35\lib\site-packages\matplotlib\axes\_axes.py:6521: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")

3.2. Univariate Normal Distribution $\sim \mathcal{N}\left(\mu,\sigma^2\right)$

  • Gaussian or normal distribution, 1D (mean $\mu$, variance $\sigma^2$)


$$ \mathcal{N}(x;\,\mu,\sigma) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{1}{2}\frac{\left(x-\mu\right)^2}{\sigma^2}\right) $$

$$ \begin{align*}x &\sim \mathcal{N}\left(\mu,\sigma^2\right) \\ & \implies P_Y\left(y\right) = P_X\left(x\right), \quad y = \frac{x-\mu}{\sigma}, \quad x=\sigma y + \mu\\\\ P_X\left(X=x\right) &\sim \exp\left(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right) \\ &= \exp\left(-\frac{1}{2}\frac{\left(x-\mu\right)^2}{\sigma^2}\right) \\\\ E[x] &= \mu \\ \text{var}(x) &= E[(x-\mu)^2] = \sigma^2 \end{align*}$$



In [10]:
mu = 2
sigma = 3

x = np.arange(-10, 10, 0.1)

ProbG3 = 1/(np.sqrt(2*np.pi)*sigma) * np.exp(-1/2*(x-mu)**2/(sigma**2))

plt.figure(figsize=(10,6))
plt.plot(y,ProbG, label='G1')
plt.plot(x,ProbG3, label='G3')
plt.legend()
plt.show()
In [11]:
ProbG2 = norm.pdf(x, 2, 3)

plt.figure(figsize=(10,6))
plt.plot(y,ProbG, label='G1')
plt.plot(x,ProbG2, label='G2')
plt.legend()
plt.show()
In [12]:
x = mu + sigma*np.random.randn(5000,1)

plt.figure(figsize=(10,6))
plt.hist(x, bins=51, normed=True)
plt.plot(y,ProbG2, label='G2')
plt.ylim([0,0.4])
plt.show()

3.3. Multivariate Gaussian Models

  • Similar to a univariate case, but in a matrix form


$$ \begin{align*} \mathcal{N}\big(x;\, \mu,\Sigma\big) &= \frac{1}{(2\pi)^{\frac{n}{2}}\lvert\Sigma\rvert^{\frac{1}{2}}}\exp\left(-\frac{1}{2}\left(x-\mu\right)^T\Sigma^{-1}\left(x-\mu\right)\right) \\ \\ \mu &= \text{length}\,\, n \,\,\text{column vector} \\ \Sigma &= n \times n \,\,\text{matrix (covariance matrix)} \\ \lvert\Sigma\rvert &= \text{matrix determinant}\\\\ E[x] &= \mu \\ \text{cov}(x) &= E[(x-\mu)(x-\mu)^T] = \Sigma \end{align*}$$


  • Multivariate Gaussian models and ellipse
    • Ellipse shows constant $\Delta^2$ value...


$$ \Delta^2 = (x-\mu)^T\Sigma^{-1} (x-\mu)$$

3.3.1. Two Independent Variables


$$ \begin{align*} P\left(X_1=x_1,X_2=x_2\right) &= P_{X_1}\left(x_1\right)P_{X_2}\left(x_2\right) \\\\ &\sim \exp \left(-\frac{1}{2} \frac{\left(x_1-\mu_{x_1}\right)^2}{\sigma_{x_1}^2}\right) \cdot \exp \left(-\frac{1}{2} \frac{\left(x_2-\mu_{x_2}\right)^2}{\sigma_{x_2}^2}\right) \\\\ &\sim\exp \left(-\frac{1}{2}\left(\frac{\left(x_1-\mu_{x_1}\right)^2}{\sigma_{x_1}^2} + \frac{\left(x_2-\mu_{x_2}\right)^2}{\sigma_{x_2}^2}\right)\right)\\ \end{align*} $$

  • In a matrix form
$$ P(x_1) \cdot P(x_2)=\frac{1}{Z_1Z_2}\exp\left(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)\right) $$


$$\left ( x=\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}, \quad \mu = \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix}, \quad \Sigma = \begin{bmatrix}\sigma_{x_1}^2 & 0 \\ 0 & \sigma_{x_2}^2 \end{bmatrix} \right )$$




$$ \begin{align*}\frac{x_1^2}{\sigma_{x_1}^2} + \frac{x_2^2}{\sigma_{x_2}^2} &= c \;\;\;\text{(ellipse)} \\ \\ \begin{bmatrix} x_1 & x_2\end{bmatrix} \begin{bmatrix} \frac{1}{\sigma_{x_1}^2} & 0 \\ 0 & \frac{1}{\sigma_{x_2}^2} \end{bmatrix} \begin{bmatrix}x_1 \\ x_2\end{bmatrix} &= c \qquad (\sigma_{x_1} < \sigma_{x_2}) \end{align*}$$


  • Summary in a matrix form
$$ \begin{align*} \mathcal{N}\left(0,\Sigma_x\right) &\sim \exp\left(-\frac{1}{2}x^T \Sigma_x^{-1} x\right) \\\\ \mathcal{N}\left(\mu_x,\Sigma_x\right) &\sim \exp\left(-\frac{1}{2}\left(x-\mu_x\right)^T \Sigma_x^{-1} \left(x-\mu_x\right)\right) \end{align*}$$
In [13]:
mu = np.array([0, 0])
sigma = np.eye(2)

m = 1000
x = np.random.multivariate_normal(mu, sigma, m)
print(x.shape)

plt.figure(figsize=(10,6))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8,8])
plt.show()
(1000, 2)
In [14]:
mu = np.array([0, 0])
sigma = np.array([[1, 0], [0, 4]])

m = 1000
x = np.random.multivariate_normal(mu, sigma, m)

plt.figure(figsize=(10,6))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8,8])
plt.show()

3.3.2. Two Dependent Variables





  • Compute $P_Y(y)$ from $P_X(x)$
$$\begin{align*} P_X(x) = P_Y(y) \;\ \text{where}\;\; & x = \begin{bmatrix}x_1 \\ x_2\end{bmatrix}, y = \begin{bmatrix}y_1 \\ y_2 \end{bmatrix} \end{align*}$$
  • Relationship between $y$ and $x$
$$ x = \begin{bmatrix}\hat{x}_1 & \hat{x}_2 \end{bmatrix}^T y = U^Ty $$



$$x^T\Sigma_x^{-1} x = y^T U\Sigma_x^{-1}U^Ty = y^T \Sigma_y^{-1} y\\ $$

$$ \begin{align*} \therefore \;\; \Sigma_y^{-1} &= U\Sigma_x^{-1} U^T \\\\ \Sigma_y &= U \Sigma_x U^T \end{align*} $$
  • $\Sigma_x$ : covariance matrix of $x$

  • $\Sigma_y$ : covariance matrix of $y$

  • If $U$ is an eigenvector matrix of $\Sigma_y$, then $\Sigma_x$ is a diagonal matrix
In [15]:
mu = np.array([0, 0])
sigma = 1./2.*np.array([[5, -3], [-3, 5]])

m = 1000
x = np.random.multivariate_normal(mu, sigma, m)

plt.figure(figsize=(10,6))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8,8])
plt.show()

Remark

$$ \begin{align*} & x \sim \mathcal{N}(\mu_x,\Sigma_x) \text{ and } y = Ax + b \; \text{affine transformation}\\ \\ \implies \; & y \sim \mathcal{N}(\mu_y,\Sigma_y) = \mathcal{N}(A\mu_x+b,A\Sigma_xA^T)\\\\ \implies \; & y \;\; \text{is also Gaussian with} \;\; \mu_y = A\mu_x+b, \;\; \Sigma_y = A \Sigma_x A^T \end{align*}$$

3.4. Decouple using Covariance Matrix

Given data, how to find $\Sigma_y$ and major (or minor) axis (assume $\mu_y = 0$)

  • This is a statistical problem
  • Estimating a Gussian model to given data


$$ \Sigma_y = \begin{bmatrix} \text{var}(y_1) & \text{cov}(y_1,y_2)\\ \text{cov}(y_2,y_1) & \text{var}(y_2) \end{bmatrix}$$



$$\begin{array}{Icr} \text{eigen-analysis} \\ \\ \Sigma_y\hat{x}_1 = \lambda_1\hat{x}_1 \\ \Sigma_y\hat{x}_2 = \lambda_2\hat{x}_2 \end{array} \quad \quad \begin{array}{Icr} \Sigma_x^{-1} = \begin{bmatrix} \frac{1}{\sqrt{\lambda_1}^2} & 0 \\ 0 & \frac{1}{\sqrt{\lambda_2}^2} \end{bmatrix} \\ \Sigma_x = \begin{bmatrix}\sqrt{\lambda_1}^2 & 0 \\ 0 & \sqrt{\lambda_2}^2 \end{bmatrix} \end{array} $$


$$\begin{array}{Icr} \begin{align*} \Sigma_y \begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix} &= \begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix} \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}\\ &= \begin{bmatrix}\hat{x}_1 & \hat{x}_2 \end{bmatrix} \Sigma_{x} \\ \\ \Sigma_y &= U \Sigma_x U^T \end{align*} \end{array} \quad \quad \quad \begin{array}{Icr} y = U x \implies U^Ty = x \\ \begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix} = U \\ \\ \\ \end{array}$$
In [16]:
S = np.cov(x.T)
print ("S = \n", S)
S = 
 [[ 2.52002523 -1.52473469]
 [-1.52473469  2.5267346 ]]
In [17]:
D, U = np.linalg.eig(S)

idx = np.argsort(-D)
D = D[idx]
U = U[:,idx]

print ("U = \n", U)
print ("D = \n", D)
U = 
 [[ 0.70632848 -0.70788423]
 [-0.70788423 -0.70632848]]
D = 
 [4.0481183  0.99864153]
In [18]:
xp = np.arange(-10, 10)

plt.figure(figsize=(10,6))
plt.plot(x[:,0],x[:,1],'.')
plt.plot(xp, U[1,0]/U[0,0]*xp, label='u1')
plt.plot(xp, U[1,1]/U[0,1]*xp, label='u2')
plt.axis('equal')
plt.ylim([-8, 8])
plt.legend()
plt.show()

4. Properties of Gaussian Distribution

  • Symmetric about the mean
  • Parameterized
  • Uncorrelated $\Longrightarrow$ independent
  • Gaussian distributions are closed to
    • Linear transformation
    • Affine transformation
    • Reduced dimension of multivariate Gaussian
      • Marginalization (projection)
      • Conditioning (slice) $\Longrightarrow$ Highly related to inference