**Artificial Neural Networks (ANN)
**

Table of Contents

# 1. Recall Perceptron¶

**Perceptron**

**XOR Problem**

- Minsky-Papert Controversy on XOR
- not linearly separable
- limitation of perceptron

# 2. From Perceptron to Multi-Layer Perceptron (MLP)¶

## 2.1. Perceptron for $h_{\omega}(x)$¶

Neurons compute the weighted sum of their inputs

A neuron is activated or fired when the sum $a$ is positive

$$ \begin{align*} a &= \omega_0 + \omega_1 x_1 + \omega_2 x_2 \\ \\ \hat{y} &= g(a) = \begin{cases} 1 & a > 0\\ 0 & \text{otherwise} \end{cases} \end{align*} $$

- A step function is not differentiable

- One layer is often not enough
- One hyperplane

## 2.2. Multi-layer Perceptron = Artificial Neural Networks (ANN)¶

Multi-neurons

Differentiable activation function

In a compact representation

Multi-layer perceptron

## 2.3. Another Perspective: ANN as Kernel Learning¶

We can represent this “neuron” as follows:

The main weakness of linear predictors is their lack of capacity. For classiﬁcation, the populations have to be linearly separable.

The XOR example can be solved by pre-processing the data to make the two populations linearly separable.