Ellipse and Gaussian Distribution

By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Coordinates¶

Coordinates with basis
- basis $\{\hat x_1 \;\hat x_2 \}$ or basis $\{\hat y_1 \;\hat y_2 \}$

$$ \begin{align*} \vec{r}_X &= \begin{bmatrix}x_1\\x_2\end{bmatrix} \text{: coordinate of} \; \vec r \text{ in basis} \; \{\hat x_1 \;\hat x_2 \} \\ \vec{r}_Y &= \begin{bmatrix}y_1\\y_2\end{bmatrix} \text{: coordinate of} \;\vec r \text{ in basis} \;\{\hat y_1 \;\hat y_2 \} \end{align*} $$

Coordinate transformation

$$\begin{align*} \vec r = x_1\hat{x}_1 + x_2\hat{x}_2 & = y_1\hat{y}_1 + y_2\hat{y}_2 \\\\ \begin{bmatrix}\hat{x}_1&\hat{x}_2\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix} &=\begin{bmatrix}\hat{y}_1&\hat{y}_2\end{bmatrix}\begin{bmatrix}y_1\\y_2\end{bmatrix} \\ \\ U\begin{bmatrix}x_1\\x_2\end{bmatrix} &= \begin{bmatrix}y_1\\y_2\end{bmatrix} \quad \left( U = \begin{bmatrix}\hat{x}_1 & \hat{x}_2\end{bmatrix}, I = \begin{bmatrix}\hat{y}_1 & \hat{y}_2\end{bmatrix} \right)\\\\ \begin{bmatrix}x_1\\x_2\end{bmatrix} &= U^{-1}\begin{bmatrix}y_1\\y_2\end{bmatrix}= U^{T}\begin{bmatrix}y_1\\y_2\end{bmatrix} \end{align*}$$

Coordinate change

$$ \begin{align*} \begin{bmatrix}x_1\\x_2\end{bmatrix} \quad &\underrightarrow{U} \quad \begin{bmatrix}y_1\\y_2\end{bmatrix} \\\\ \begin{bmatrix}y_1\\y_2\end{bmatrix} \quad &\underrightarrow{U^T} \quad \begin{bmatrix}x_1\\x_2\end{bmatrix} \end{align*} $$

2. Equation of an Ellipse¶

Unit circle

$$ \begin{align*} & x_1^2 + x_2^2 = 1 \implies \\ \\ &\begin{bmatrix}x_1 & x_2 \end{bmatrix} \begin{bmatrix}1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = 1 \end{align*} $$

Independent ellipse

$$ \begin{align*} \frac{x_1^2}{a^2} + \frac{x_2^2}{b^2} = 1 &\implies \begin{bmatrix}x_1 & x_2 \end{bmatrix} \begin{bmatrix}\frac{1}{a^2} & 0 \\ 0 & \frac{1}{b^2} \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = 1 \\ & \implies \begin{bmatrix}x_1 & x_2 \end{bmatrix} \Sigma_x^{-1} \begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = 1\\ \\ &\text{where} \; \Sigma_x^{-1} =\begin{bmatrix}\frac{1}{a^2} & 0\\ 0 & \frac{1}{b^2} \end{bmatrix}, \,\Sigma_x = \begin{bmatrix}a^2 & 0\\ 0 & b^2 \end{bmatrix} \end{align*} $$

Dependent ellipse (Rotated ellipse)
- Coordinate changes

$$\begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = U^T \begin{bmatrix}y_1 \\ y_2 \end{bmatrix},\quad \begin{array}{Icr} \begin{align} &x= U^Ty \\ &Ux=y \end{align} \end{array} $$

Now we know in basis $\{\hat x_1 \;\hat x_2 \} $

$$ x^T \Sigma_x^{-1}x = 1 \quad \text{and} \quad \Sigma_x = \begin{bmatrix}a^2 & 0\\ 0 & b^2 \end{bmatrix} $$

Then, we can find $\Sigma_y$ such that

$$ \begin{align*} y^T \Sigma_y^{-1}y &= 1 \quad \text{and} \quad \Sigma_y = ?\\\\ \implies x^T \Sigma_x^{-1} x &= y^T U\Sigma_x^{-1} U^Ty = 1 \quad (\Sigma_y^{-1} \; \text{: similar matrix to } \Sigma_x^{-1}) \\\\ \therefore \;\Sigma_y^{-1} &= U\Sigma_x^{-1}U^T \quad \text{or}\\\\ \Sigma_y &= U\Sigma_x U^T \end{align*}$$

The quation of dependent ellipse

$$y^T \Sigma_y^{-1}y = 1 \quad \text{where} \quad \Sigma_y = U\Sigma_x U^T$$

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

theta = np.arange(0,2*np.pi,0.01)
x1 = np.cos(theta)
x2 = np.sin(theta)

plt.figure(figsize=(6,6))
plt.plot(x1, x2)
plt.xlim([-4,4])
plt.ylim([-4,4])
plt.show()

x1 = 3*np.cos(theta);
x2 = np.sin(theta);

plt.figure(figsize=(6,6))
plt.plot(x1, x2)
plt.xlim([-4,4])
plt.ylim([-4,4])
plt.show()

$$ U = \left[\hat x_1 \;\hat x_2 \right] = \begin{bmatrix}\frac{1}{2} & -\frac{\sqrt{3}}{2}\\ \frac{\sqrt{3}}{2} & \frac{1}{2} \end{bmatrix}$$

U = np.matrix([[1/2, -np.sqrt(3)/2], 
              [np.sqrt(3)/2, 1/2]])

x = np.matrix([x1, x2])

y = U*x
print("U = \n", U)

plt.figure(figsize=(6,6))
plt.plot(y[0,:].T, y[1,:].T)
plt.xlim([-4,4])
plt.ylim([-4,4])
plt.show()

U = 
 [[ 0.5       -0.8660254]
 [ 0.8660254  0.5      ]]

Sx = np.matrix([[9, 0],
               [0, 1]])

Sy = U*Sx*U.T

print ("Sx = \n", Sx, "\n")
print ("Sy = \n", Sy)

Sx = 
 [[9 0]
 [0 1]] 

Sy = 
 [[3.         3.46410162]
 [3.46410162 7.        ]]

2.1. Question (Reverse Problem)¶

Given $\Sigma_y^{-1}$ (or $\Sigma_y)$,
- how to find $a$ (major axis) and $b$ (minor axis)
- how to find the $\Sigma_x$
- how to find the proper matrix $U$

eigenvectors of $\Sigma$

$$A=S \Lambda S^T \qquad \text{where } S=[\upsilon_1 \; \upsilon_2] \text{ eigenvector of } A, \; \text{and }\Lambda=\begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}$$

$$ \text{here, }\, \Sigma_y = U\Sigma_x U^T = U \Lambda U^T \qquad \text{where } U = \begin{bmatrix}\hat{x}_1 & \hat{x}_2 \end{bmatrix} \text{ eigenvector of } \Sigma_y, \; \text{and }\Lambda=\begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix} = \begin{bmatrix} a^2 & 0 \\ 0 & b^2 \end{bmatrix} $$

$$\begin{align*} \text{eigen-analysis} \begin{cases}\Sigma_y\hat{x}_1=\lambda_1 \hat{x}_1\\ \Sigma_y\hat{x}_2=\lambda_2 \hat{x}_2\\ \end{cases} \; \implies \; & \Sigma_y \underbrace{\begin{bmatrix}\hat{x}_1 & \hat{x}_2 \end{bmatrix}}_{U} = \underbrace{\begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix}}_{U} \underbrace{\begin{bmatrix}\lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}}_{\Lambda} \\ \\ & \Sigma_y U = U\Lambda \\ & \Sigma_y = U \Lambda U^T = U \Sigma_x U^T \end{align*}$$

$$\begin{array}{Icr} \\x=U^Ty \\\\ \begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = U^T \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \end{array} \quad \quad \begin{array}{Icr} \begin{align*} & a=\sqrt{\lambda_1}\\ & b = \sqrt{\lambda_2}\\\\ & \text{major axis}=\hat{x}_1\\ & \text{minor axis}=\hat{x}_2 \end{align*} \end{array}$$

Again, it is an eigenvalue and eigenvector problem.

D, U = np.linalg.eig(Sy)

idx = np.argsort(-D)
D = D[idx]
U = U[:,idx]

print ("D = \n", np.diag(D))
print ("U = \n", U)

D = 
 [[9. 0.]
 [0. 1.]]
U = 
 [[-0.5       -0.8660254]
 [-0.8660254  0.5      ]]

2.2. Summary¶

$$ \begin{align*} x &=U^Ty \\\\ U &=\begin{bmatrix}\hat{x}_1 & \hat{x}_2\end{bmatrix} \end{align*}$$

Independent ellipse in $\{\hat{x}_1, \hat{x}_2 \}$
Dependent ellipse in $\{ \hat{y}_1, \hat{y}_2 \}$
Decouple
- Diagonalize
- Eigen-analysis

3. Gaussian Distribution¶

3.1. Standard Univariate Normal Distribution $\sim \mathcal{N}(0,1^2)$¶

$$ P_Y\left(Y=y\right)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{1}{2}y^2\right)$$

$$ \frac{1}{2} y^2 = \text{const} \implies \text{prob. contour}$$

y = np.arange(-10,10,0.1)

ProbG = 1/np.sqrt(2*np.pi)*np.exp(-1/2*y**2)

plt.figure(figsize=(10,6))
plt.plot(y, ProbG)
plt.show()

from scipy.stats import norm

ProbG2 = norm.pdf(y)

plt.figure(figsize=(10,6))
plt.plot(y, ProbG2)
plt.show()

x = np.random.randn(5000,1)

plt.figure(figsize=(10,6))
plt.hist(x, bins=51, normed=True)
plt.plot(y, ProbG2, label='G2')
plt.show()

c:\users\seungchul\appdata\local\programs\python\python35\lib\site-packages\matplotlib\axes\_axes.py:6521: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")

3.2. Univariate Normal Distribution $\sim \mathcal{N}\left(\mu,\sigma^2\right)$¶

Gaussian or normal distribution, 1D (mean $\mu$, variance $\sigma^2$)

$$ \mathcal{N}(x;\,\mu,\sigma) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{1}{2}\frac{\left(x-\mu\right)^2}{\sigma^2}\right) $$

$$ \begin{align*}x &\sim \mathcal{N}\left(\mu,\sigma^2\right) \\ & \implies P_Y\left(y\right) = P_X\left(x\right), \quad y = \frac{x-\mu}{\sigma}, \quad x=\sigma y + \mu\\\\ P_X\left(X=x\right) &\sim \exp\left(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right) \\ &= \exp\left(-\frac{1}{2}\frac{\left(x-\mu\right)^2}{\sigma^2}\right) \\\\ E[x] &= \mu \\ \text{var}(x) &= E[(x-\mu)^2] = \sigma^2 \end{align*}$$

mu = 2
sigma = 3

x = np.arange(-10, 10, 0.1)

ProbG3 = 1/(np.sqrt(2*np.pi)*sigma) * np.exp(-1/2*(x-mu)**2/(sigma**2))

plt.figure(figsize=(10,6))
plt.plot(y,ProbG, label='G1')
plt.plot(x,ProbG3, label='G3')
plt.legend()
plt.show()

ProbG2 = norm.pdf(x, 2, 3)

plt.figure(figsize=(10,6))
plt.plot(y,ProbG, label='G1')
plt.plot(x,ProbG2, label='G2')
plt.legend()
plt.show()

x = mu + sigma*np.random.randn(5000,1)

plt.figure(figsize=(10,6))
plt.hist(x, bins=51, normed=True)
plt.plot(y,ProbG2, label='G2')
plt.ylim([0,0.4])
plt.show()

3.3. Multivariate Gaussian Models¶

Similar to a univariate case, but in a matrix form

$$ \begin{align*} \mathcal{N}\big(x;\, \mu,\Sigma\big) &= \frac{1}{(2\pi)^{\frac{n}{2}}\lvert\Sigma\rvert^{\frac{1}{2}}}\exp\left(-\frac{1}{2}\left(x-\mu\right)^T\Sigma^{-1}\left(x-\mu\right)\right) \\ \\ \mu &= \text{length}\,\, n \,\,\text{column vector} \\ \Sigma &= n \times n \,\,\text{matrix (covariance matrix)} \\ \lvert\Sigma\rvert &= \text{matrix determinant}\\\\ E[x] &= \mu \\ \text{cov}(x) &= E[(x-\mu)(x-\mu)^T] = \Sigma \end{align*}$$

Multivariate Gaussian models and ellipse
- Ellipse shows constant $\Delta^2$ value...

$$ \Delta^2 = (x-\mu)^T\Sigma^{-1} (x-\mu)$$

3.3.1. Two Independent Variables¶

$$ \begin{align*} P\left(X_1=x_1,X_2=x_2\right) &= P_{X_1}\left(x_1\right)P_{X_2}\left(x_2\right) \\\\ &\sim \exp \left(-\frac{1}{2} \frac{\left(x_1-\mu_{x_1}\right)^2}{\sigma_{x_1}^2}\right) \cdot \exp \left(-\frac{1}{2} \frac{\left(x_2-\mu_{x_2}\right)^2}{\sigma_{x_2}^2}\right) \\\\ &\sim\exp \left(-\frac{1}{2}\left(\frac{\left(x_1-\mu_{x_1}\right)^2}{\sigma_{x_1}^2} + \frac{\left(x_2-\mu_{x_2}\right)^2}{\sigma_{x_2}^2}\right)\right)\\ \end{align*} $$

In a matrix form

$$ P(x_1) \cdot P(x_2)=\frac{1}{Z_1Z_2}\exp\left(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)\right) $$

$$\left ( x=\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}, \quad \mu = \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix}, \quad \Sigma = \begin{bmatrix}\sigma_{x_1}^2 & 0 \\ 0 & \sigma_{x_2}^2 \end{bmatrix} \right )$$

$$ \begin{align*}\frac{x_1^2}{\sigma_{x_1}^2} + \frac{x_2^2}{\sigma_{x_2}^2} &= c \;\;\;\text{(ellipse)} \\ \\ \begin{bmatrix} x_1 & x_2\end{bmatrix} \begin{bmatrix} \frac{1}{\sigma_{x_1}^2} & 0 \\ 0 & \frac{1}{\sigma_{x_2}^2} \end{bmatrix} \begin{bmatrix}x_1 \\ x_2\end{bmatrix} &= c \qquad (\sigma_{x_1} < \sigma_{x_2}) \end{align*}$$

Summary in a matrix form

$$ \begin{align*} \mathcal{N}\left(0,\Sigma_x\right) &\sim \exp\left(-\frac{1}{2}x^T \Sigma_x^{-1} x\right) \\\\ \mathcal{N}\left(\mu_x,\Sigma_x\right) &\sim \exp\left(-\frac{1}{2}\left(x-\mu_x\right)^T \Sigma_x^{-1} \left(x-\mu_x\right)\right) \end{align*}$$

mu = np.array([0, 0])
sigma = np.eye(2)

m = 1000
x = np.random.multivariate_normal(mu, sigma, m)
print(x.shape)

plt.figure(figsize=(10,6))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8,8])
plt.show()

(1000, 2)

mu = np.array([0, 0])
sigma = np.array([[1, 0], [0, 4]])

m = 1000
x = np.random.multivariate_normal(mu, sigma, m)

plt.figure(figsize=(10,6))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8,8])
plt.show()

3.3.2. Two Dependent Variables¶

Compute $P_Y(y)$ from $P_X(x)$

$$\begin{align*} P_X(x) = P_Y(y) \;\ \text{where}\;\; & x = \begin{bmatrix}x_1 \\ x_2\end{bmatrix}, y = \begin{bmatrix}y_1 \\ y_2 \end{bmatrix} \end{align*}$$

Relationship between $y$ and $x$

$$ x = \begin{bmatrix}\hat{x}_1 & \hat{x}_2 \end{bmatrix}^T y = U^Ty $$

$$x^T\Sigma_x^{-1} x = y^T U\Sigma_x^{-1}U^Ty = y^T \Sigma_y^{-1} y\\ $$

$$ \begin{align*} \therefore \;\; \Sigma_y^{-1} &= U\Sigma_x^{-1} U^T \\\\ \Sigma_y &= U \Sigma_x U^T \end{align*} $$

$\Sigma_x$ : covariance matrix of $x$
$\Sigma_y$ : covariance matrix of $y$

If $U$ is an eigenvector matrix of $\Sigma_y$, then $\Sigma_x$ is a diagonal matrix

mu = np.array([0, 0])
sigma = 1./2.*np.array([[5, -3], [-3, 5]])

m = 1000
x = np.random.multivariate_normal(mu, sigma, m)

plt.figure(figsize=(10,6))
plt.plot(x[:,0], x[:,1], '.')
plt.axis('equal')
plt.ylim([-8,8])
plt.show()

Remark

$$ \begin{align*} & x \sim \mathcal{N}(\mu_x,\Sigma_x) \text{ and } y = Ax + b \; \text{affine transformation}\\ \\ \implies \; & y \sim \mathcal{N}(\mu_y,\Sigma_y) = \mathcal{N}(A\mu_x+b,A\Sigma_xA^T)\\\\ \implies \; & y \;\; \text{is also Gaussian with} \;\; \mu_y = A\mu_x+b, \;\; \Sigma_y = A \Sigma_x A^T \end{align*}$$

3.4. Decouple using Covariance Matrix¶

Given data, how to find $\Sigma_y$ and major (or minor) axis (assume $\mu_y = 0$)

This is a statistical problem
Estimating a Gussian model to given data

$$ \Sigma_y = \begin{bmatrix} \text{var}(y_1) & \text{cov}(y_1,y_2)\\ \text{cov}(y_2,y_1) & \text{var}(y_2) \end{bmatrix}$$

$$\begin{array}{Icr} \text{eigen-analysis} \\ \\ \Sigma_y\hat{x}_1 = \lambda_1\hat{x}_1 \\ \Sigma_y\hat{x}_2 = \lambda_2\hat{x}_2 \end{array} \quad \quad \begin{array}{Icr} \Sigma_x^{-1} = \begin{bmatrix} \frac{1}{\sqrt{\lambda_1}^2} & 0 \\ 0 & \frac{1}{\sqrt{\lambda_2}^2} \end{bmatrix} \\ \Sigma_x = \begin{bmatrix}\sqrt{\lambda_1}^2 & 0 \\ 0 & \sqrt{\lambda_2}^2 \end{bmatrix} \end{array} $$

$$\begin{array}{Icr} \begin{align*} \Sigma_y \begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix} &= \begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix} \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}\\ &= \begin{bmatrix}\hat{x}_1 & \hat{x}_2 \end{bmatrix} \Sigma_{x} \\ \\ \Sigma_y &= U \Sigma_x U^T \end{align*} \end{array} \quad \quad \quad \begin{array}{Icr} y = U x \implies U^Ty = x \\ \begin{bmatrix} \hat{x}_1 & \hat{x}_2 \end{bmatrix} = U \\ \\ \\ \end{array}$$

S = np.cov(x.T)
print ("S = \n", S)

S = 
 [[ 2.52002523 -1.52473469]
 [-1.52473469  2.5267346 ]]

D, U = np.linalg.eig(S)

idx = np.argsort(-D)
D = D[idx]
U = U[:,idx]

print ("U = \n", U)
print ("D = \n", D)

U = 
 [[ 0.70632848 -0.70788423]
 [-0.70788423 -0.70632848]]
D = 
 [4.0481183  0.99864153]

xp = np.arange(-10, 10)

plt.figure(figsize=(10,6))
plt.plot(x[:,0],x[:,1],'.')
plt.plot(xp, U[1,0]/U[0,0]*xp, label='u1')
plt.plot(xp, U[1,1]/U[0,1]*xp, label='u2')
plt.axis('equal')
plt.ylim([-8, 8])
plt.legend()
plt.show()

4. Properties of Gaussian Distribution¶

Symmetric about the mean

Parameterized

Uncorrelated $\Longrightarrow$ independent

Gaussian distributions are closed to
- Linear transformation
- Affine transformation
- Reduced dimension of multivariate Gaussian
  - Marginalization (projection)
  - Conditioning (slice) $\Longrightarrow$ Highly related to inference

4.1. Affine Transformation¶

Suppose $x \sim \mathcal{N}(\mu_x, \Sigma_x)$

Consider affine transformation of $x$

$$y = Ax + b$$

Then it is amazing that $y$ is Gaussian with

$$ \begin{align*} E[y] & = AE[x] + b = A\mu_x + b\\\\ \text{cov}(y) & = \Sigma_y = A\text{cov}(x)A^T = A\Sigma_x A^T \end{align*} $$

4.2 Marginal Probability of a Gaussian¶

Suppose $x \sim \mathcal{N}(\mu, \Sigma)$

$$x = \begin{bmatrix} x_1 \\ x_2\end{bmatrix}, \quad \mu = \begin{bmatrix} \mu_1 \\ \mu_2\end{bmatrix}, \quad \Sigma = \begin{bmatrix} \Sigma_{11}& \Sigma_{12}\\ \Sigma_{21}& \Sigma_{22} \end{bmatrix}$$

Let's look at the component $x_1$

$$ \begin{align*} x_1 &= \begin{bmatrix} I & 0\end{bmatrix} x = Ax \quad (\text{affine transformation})\\\\ E[x_1] &= \begin{bmatrix} I & 0\end{bmatrix} E[x] = \mu_1 \\\\ \text{cov}(x_1) &= \begin{bmatrix} I & 0\end{bmatrix} \text{cov}(x) \begin{bmatrix} I \\ 0\end{bmatrix} = \begin{bmatrix} I & 0\end{bmatrix}\Sigma \begin{bmatrix} I \\ 0\end{bmatrix} = \Sigma_{11} \end{align*} $$

In fact, the random vector $x_1$ is also Gaussian. (this is not obvious)

4.3. Component of a Gaussian Random Vector¶

Suppose $x \sim \mathcal{N}(0, \Sigma)$, $c \in \mathbb{R}^n$ be a unit vector

$$y = c^Tx$$

$y$ is the component of $x$ in the direction $c$

$y$ is Gaussian with $E[y] = 0, \quad \text{cov}(y) = c^T \Sigma c$

So $E[y^2] = c^T \Sigma c$

The unit vector $c$ that minimizes $c^T \Sigma c$ is the eigenvector of $\Sigma$ with the smallest eigenvalue

$$E[y^2] = \lambda_{\text{min}}$$

Notice that we have seen this in PCA

4.4. Conditional Probability of a Gaussian Random Vector¶

$$\begin{bmatrix} x \\ y\end{bmatrix} \sim \mathcal{N}\left( \begin{bmatrix} \mu_x \\ \mu_y\end{bmatrix}, \begin{bmatrix} \Sigma_{x}& \Sigma_{xy}\\ \Sigma_{yx}& \Sigma_{y} \end{bmatrix} \right)$$

The conditional pdf of $x$ given $y$ is Gaussian

$$x \mid y \sim \mathcal{N} \left(\mu_x + \Sigma_{xy}\Sigma_{y}^{-1}(y-\mu_y), \; \Sigma_{x} - \Sigma_{xy}\Sigma_{y}^{-1}\Sigma_{yx} \right)$$

The conditional mean is

$$E[x \mid y] = \mu_x + \Sigma_{xy}\Sigma_{y}^{-1}(y-\mu_y)$$

The conditional covariance is

$$\text{cov}(x \mid y) = \Sigma_{x} - \Sigma_{xy}\Sigma_{y}^{-1}\Sigma_{yx} \leq \Sigma_{x}$$

Notice that conditional confidence intervals are narrower. i.e., measuring $y$ gives information about $x$

mu = np.array([0, 1])
sigma = 1./2.*np.array([[5, -3], [-3, 5]])

m = 1000000
x = np.random.multivariate_normal(mu, sigma, m)

xp = np.arange(-10, 10)

plt.figure(figsize=(10,10))
plt.plot(x[:,0],x[:,1],'.')
plt.axis('equal')
plt.axis([-6, 6, -6, 6])
plt.axvline(0, 0, 1, c = 'k', linestyle = '--')
plt.axhline(0, 0, 1, c = 'k', linestyle = '--')
plt.axhline(4, 0, 1, c = 'b')
plt.axhline(1, 0.45, 0.55, c = 'k')
plt.axvline(0, 0.53, 0.63, c = 'k')
plt.show()

x1 = x[:,0]
x2 = x[:,1]

plt.figure(figsize=(10,3))
plt.xlim(-6,6)
plt.axvline(0, 0, 1, c = 'k')
plt.hist(x1, bins=51, normed=True, alpha = 0.7)

plt.figure(figsize=(10,3))
plt.xlim(-6,6)
plt.axvline(1, 0, 1, c = 'k')
plt.hist(x2, bins=51, normed=True, alpha = 0.7)
plt.show()

z = []
for i in range(m):
    if x[i,1] > 3.8 and x[i,1] < 4.2:
        z.append(x[i,0])

plt.figure(figsize=(10,3))
plt.hist(z, bins=51, normed=True, alpha = 0.7)
plt.axvline(0, 0, 1, c = 'k')
plt.xlim(-6,6)
plt.show()

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')