By Prof. Seungchul Lee

http://iailab.kaist.ac.kr/

Industrial AI Lab at KASIT

http://iailab.kaist.ac.kr/

Industrial AI Lab at KASIT

Table of Contents

**Correlation of Two Random Variables**

$$ \begin{align*} \text{Sample Variance} : S_x &= \frac{1}{m-1} \sum\limits_{i=1}^{m}\left(x^{(i)}-\bar x\right)^2 \\ \text{Sample Covariance} : S_{xy} &= \frac{1}{m-1} \sum\limits_{i=1}^{m}\left(x^{(i)}-\bar x\right)\left(y^{(i)}-\bar y \right)\\ \text{Sample Covariance matrix} : S &= \begin{bmatrix} S_x & S_{xy} \\ S_{yx} & S_y \end{bmatrix}\\ \text{sample correlation coefficient} : r &= \frac{S_{xy}}{ \sqrt {S_{xx}\cdot S_{yy}} } \end{align*}$$

- Strength of
**linear**relationship between two variables, $x$ and $y$

**Correlation Coefficient Plot**

- Plots correlation coefficients among pairs of variables
- http://rpsychologist.com/d3/correlation/

**Covariance Matrix**

$$ \sum = \begin{bmatrix} E[(X_1-\mu_1)(X_1-\mu_1)]& E[(X_1-\mu_1)(X_2-\mu_2)] & \cdots &E[(X_1-\mu_1)(X_n-\mu_n)]\\ E[(X_2-\mu_2)(X_1-\mu_1)]& E[(X_2-\mu_2)(X_2-\mu_2)] & \cdots &E[(X_2-\mu_2)(X_n-\mu_n)]\\ \vdots & \vdots & \ddots & \vdots\\ E[(X_n-\mu_n)(X_1-\mu_1)]& E[(X_n-\mu_n)(X_2-\mu_2)] & \cdots &E[(X_n-\mu_n)(X_n-\mu_n)]\\ \end{bmatrix}$$

How?

idea: highly correlated data contains redundant features

Now consider a change of axes

Each example $x$ has 2 features $\{u_1,u_2\}$

Consider ignoring the feature $u_2$ for each example

Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{u_1\}$

Are we losing much information by throwing away $u_2$ ?

No. Most of the data spread is along $u_1$ (very little variance along $u_2$)

- Data $\rightarrow$ projection onto unit vector $\hat{u}_1$
- PCA is used when we want projections capturing maximum variance directions
- Principal Components (PC): directions of maximum variability in the data
- Roughly speaking, PCA does a change of axes that can represent the data in a succinct manner