Dimension Reduction


By Prof. Seungchul Lee
http://iailab.kaist.ac.kr/
Industrial AI Lab at KASIT

Table of Contents

1. Multivariate Statistics

Correlation of Two Random Variables


$$ \begin{align*} \text{Sample Variance} : S_x &= \frac{1}{m-1} \sum\limits_{i=1}^{m}\left(x^{(i)}-\bar x\right)^2 \\ \text{Sample Covariance} : S_{xy} &= \frac{1}{m-1} \sum\limits_{i=1}^{m}\left(x^{(i)}-\bar x\right)\left(y^{(i)}-\bar y \right)\\ \text{Sample Covariance matrix} : S &= \begin{bmatrix} S_x & S_{xy} \\ S_{yx} & S_y \end{bmatrix}\\ \text{sample correlation coefficient} : r &= \frac{S_{xy}}{ \sqrt {S_{xx}\cdot S_{yy}} } \end{align*}$$
  • Strength of linear relationship between two variables, $x$ and $y$

Correlation Coefficient Plot



Covariance Matrix


$$ \sum = \begin{bmatrix} E[(X_1-\mu_1)(X_1-\mu_1)]& E[(X_1-\mu_1)(X_2-\mu_2)] & \cdots &E[(X_1-\mu_1)(X_n-\mu_n)]\\ E[(X_2-\mu_2)(X_1-\mu_1)]& E[(X_2-\mu_2)(X_2-\mu_2)] & \cdots &E[(X_2-\mu_2)(X_n-\mu_n)]\\ \vdots & \vdots & \ddots & \vdots\\ E[(X_n-\mu_n)(X_1-\mu_1)]& E[(X_n-\mu_n)(X_2-\mu_2)] & \cdots &E[(X_n-\mu_n)(X_n-\mu_n)]\\ \end{bmatrix}$$

2. Principal Component Analysis (PCA)


2.1. Motivation

  • Can we describe high-dimensional data in a "simpler" way?

$\quad \rightarrow$ Dimension reduction without losing too much information
$\quad \rightarrow$ Find a low-dimensional, yet useful representation of the data

  • How?

    idea: highly correlated data contains redundant features




  • Now consider a change of axes

    • Each example $x$ has 2 features $\{u_1,u_2\}$

    • Consider ignoring the feature $u_2$ for each example

    • Each 2-dimensional example $x$ now becomes 1-dimensional $x = \{u_1\}$

    • Are we losing much information by throwing away $u_2$ ?

    • No. Most of the data spread is along $u_1$ (very little variance along $u_2$)



  • Data $\rightarrow$ projection onto unit vector $\hat{u}_1$
    • PCA is used when we want projections capturing maximum variance directions
    • Principal Components (PC): directions of maximum variability in the data
    • Roughly speaking, PCA does a change of axes that can represent the data in a succinct manner