Artificial Neural Networks (ANN)


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

0. Video Lectures

In [3]:
%%html 
<center><iframe src="https://www.youtube.com/embed/blDtzUuJtiE?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [2]:
%%html 
<center><iframe src="https://www.youtube.com/embed/6O_WHmBUff4?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [1]:
%%html 
<center><iframe src="https://www.youtube.com/embed/DZgihzTgVQ8?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

1. Recall Perceptron

Perceptron


XOR Problem

  • Minsky-Papert Controversy on XOR
    • not linearly separable
    • limitation of perceptron
$x_1$ $x_2$ $x_1$ XOR $x_2$
0 0 0
0 1 1
1 0 1
1 1 0



2. From Perceptron to Multi-Layer Perceptron (MLP)

2.1. Perceptron for $h_{\omega}(x)$

  • Neurons compute the weighted sum of their inputs

  • A neuron is activated or fired when the sum $a$ is positive


$$ \begin{align*} a &= \omega_0 + \omega_1 x_1 + \omega_2 x_2 \\ \\ \hat{y} &= g(a) = \begin{cases} 1 & a > 0\\ 0 & \text{otherwise} \end{cases} \end{align*} $$



  • A step function is not differentiable


  • One layer is often not enough
    • One hyperplane

2.2. Multi-layer Perceptron = Artificial Neural Networks (ANN)

Multi-neurons



Differentiable activation function




In a compact representation




Multi-layer perceptron


2.3. Another Perspective: ANN as Kernel Learning

We can represent this “neuron” as follows:

  • The main weakness of linear predictors is their lack of capacity. For classification, the populations have to be linearly separable.

  • The XOR example can be solved by pre-processing the data to make the two populations linearly separable.