ANN Training


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Recursive Algorithm

  • One of the central ideas of computer science

  • Depends on solutions to smaller instances of the same problem ( = subproblem)

  • Function to call itself (it is impossible in the real world)



In [1]:
%%html
<center><iframe src="https://www.youtube.com/embed/t4MSwiqfLaY?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
  • Factorial example


$$n ! = n \cdot (n-1) \cdots 2 \cdot 1$$

In [2]:
n = 5

m = 1
for i in range(n):
    m = m*(i+1)
    
print(m)
120
In [3]:
def fac(n):
    if n == 1:
        return 1
    else:
        return n*fac(n-1)    
In [4]:
# recursive

fac(5)
Out[4]:
120

2. Dynamic Programming

  • Dynamic Programming: general, powerful algorithm design technique

  • Fibonacci numbers:

In [5]:
# naive Fibonacci

def fib(n):
    if n <= 2:
        return 1
    else:
        return fib(n-1) + fib(n-2)    
In [6]:
fib(10)
Out[6]:
55
In [7]:
# Memorized DP Fibonacci

def mfib(n):
    global memo
        
    if memo[n-1] != 0:
        return memo[n-1]
    elif n <= 2:
        return 1
    else:
        memo[n-1] = mfib(n-1) + mfib(n-2)
        return memo[n-1]
In [8]:
import numpy as np

n = 10
memo = np.zeros(n)
mfib(n)
Out[8]:
55.0
In [9]:
n = 30
%timeit fib(30)
172 ms ± 830 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [10]:
memo = np.zeros(n)
%timeit mfib(30)
402 ns ± 3.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

3. Training Neural Networks

$=$ Learning or estimating weights and biases of multi-layer perceptron from training data

3.1. Optimization

3 key components

  1. objective function $f(\cdot)$
  2. decision variable or unknown $\omega$
  3. constraints $g(\cdot)$

In mathematical expression



$$\begin{align*} \min_{\omega} \quad &f(\omega) \end{align*} $$

3.2. Loss Function

  • Measures error between target values and predictions


$$ \min_{\omega} \sum_{i=1}^{m}\ell\left( h_{\omega}\left(x^{(i)}\right),y^{(i)}\right)$$

  • Example
    • Squared loss (for regression): $$ \frac{1}{m} \sum_{i=1}^{m} \left(h_{\omega}\left(x^{(i)}\right) - y^{(i)}\right)^2 $$
    • Cross entropy (for classification): $$ -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log\left(h_{\omega}\left(x^{(i)}\right)\right) + \left(1-y^{(i)}\right)\log\left(1-h_{\omega}\left(x^{(i)}\right)\right)$$

3.3. Learning

Learning weights and biases from data using gradient descent


$$\omega \Leftarrow \omega - \alpha \nabla_{\omega} \ell \left( h_{\omega}\left(x^{(i)}\right), y^{(i)} \right)$$
  • $\frac{\partial \ell}{\partial \omega}$: too many computations are required for all $\omega$
  • Structural constraints of NN:
    • Composition of functions
    • Chain rule
    • Dynamic programming


Backpropagation

  • Forward propagation
    • the initial information propagates up to the hidden units at each layer and finally produces output
  • Backpropagation
    • allows the information from the cost to flow backwards through the network in order to compute the gradients
  • Chain Rule

    • Computing the derivative of the composition of functions

      • $\space f(g(x))' = f'(g(x))g'(x)$

      • $\space {dz \over dx} = {dz \over dy} \bullet {dy \over dx}$

      • $\space {dz \over dw} = ({dz \over dy} \bullet {dy \over dx}) \bullet {dx \over dw}$

      • $\space {dz \over du} = ({dz \over dy} \bullet {dy \over dx} \bullet {dx \over dw}) \bullet {dw \over du}$

  • Backpropagation

    • Update weights recursively with memory

Optimization procedure


  • It is not easy to numerically compute gradients in network in general.
    • The good news: people have already done all the "hardwork" of developing numerical solvers (or libraries)
    • There are a wide range of tools: $\color{Red}{\text{TensorFlow}}$

Summary

  • Learning weights and biases from data using gradient descent


4. Other Tutorials

In [11]:
%%html
<center><iframe src="https://www.youtube.com/embed/aircAruvnKk?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [12]:
%%html
<center><iframe src="https://www.youtube.com/embed/IHZwWFHWa-w?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [13]:
%%html
<center><iframe src="https://www.youtube.com/embed/Ilg3gGewQ5U?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [14]:
%%html
<center><iframe src="https://www.youtube.com/embed/tIeHLnjs5U8?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [15]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')