Optimization
http://iailab.kaist.ac.kr/
Industrial AI Lab at KAIST
Table of Contents
1. Optimization¶
from IPython.display import YouTubeVideo
YouTubeVideo('iiKKPAF1inU', width="560", height="315", frameborder="0")
an important tool in 1) engineering problem solving and 2) decision science
people optimize
nature optimizes
3 key components
- objective function
- decision variable or unknown
- constraints
Procedures
- The process of identifying objective, variables, and constraints for a given problem is known as "modeling"
- Once the model has been formulated, optimization algorithm can be used to find its solutions.
In mathematical expression
$$\begin{align*} \min_{x} \quad &f(x) \\ \text{subject to} \quad &g_i(x) \leq 0, \qquad i=1,\cdots,m \end{align*} $$
$\;\;\; $where
$x=\begin{bmatrix}x_1 \\ \vdots \\ x_n\end{bmatrix} \in \mathbb{R}^n$ is the decision variable
$f: \mathbb{R}^n \rightarrow \mathbb{R}$ is an objective function
Feasible region: $\mathcal{C} = \{x: g_i(x) \leq 0. \quad i=1, \cdots,m\}$
Remarks) equivalent
$$\begin{align*} \min_{x} f(x) \quad&\leftrightarrow \quad \max_{x} -f(x)\\ \quad g_i(x) \leq 0\quad&\leftrightarrow \quad -g_i(x) \geq 0\\ h(x) = 0 \quad&\leftrightarrow \quad \begin{cases} h(x) \leq 0 \quad \text{and} \\ h(x) \geq 0 \end{cases} \end{align*} $$
- The good news: for many classes of optimization problems, people have already done all the "hardwork" of developing numerical algorithms
- A wide range of tools that can take optimization problems in "natural" forms and compute a solution
2. Solving Optimization Problems¶
from IPython.display import YouTubeVideo
YouTubeVideo('CqYJhPOFPGk', width="560", height="315", frameborder="0")
- Starting with the unconstrained, one dimensional case
To find minimum point $x^*$, we can look at the derivave of the function, $f'(x)$
- Any location where $f'(x)$ = 0 will be a "flat" point in the function
For convex problems, this is guaranteed to be a minimum
Generalization for multivariate function $f:\mathbb{R}^n \rightarrow \ \mathbb{R}$
- The gradient of $f$ must be zero
$$ \nabla _x f(x) = 0$$
- Gradient is a n-dimensional vector containing partial derivatives with respect to each dimension
$$ x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \quad \quad \quad \quad \nabla _x f(x) = \begin{bmatrix} \partial f(x) \over \partial x_1 \\ \vdots\\ \partial f(x) \over \partial x_n \end{bmatrix} $$
- For continuously differentiable $f$ and unconstrained optimization, optimal point must have $\nabla _x f(x^*)=0$
2.1. How To Find $\nabla _x f(x) = 0$: Analytic Approach¶
Direct solution
- In some cases, it is possible to analytically compute $x^*$ such that $ \nabla _x f(x^*)=0$
$$ \begin{align*} f(x) &= 2x_1^2+ x_2^2 + x_1 x_2 -6 x_1 -5 x_2\\\\ \Longrightarrow \nabla _x f(x) &= \begin{bmatrix} 4x_1+x_2-6\\ 2x_2 + x_1 -5 \end{bmatrix} = \begin{bmatrix}0\\0 \end{bmatrix}\\\\ \therefore x^* &= \begin{bmatrix} 4 & 1\\ 1 & 2 \end{bmatrix} ^{-1} \begin{bmatrix} 6 \\ 5\\ \end{bmatrix} = \begin{bmatrix} 1 \\ 2\\ \end{bmatrix} \end{align*} $$
- Note: Matrix derivatives
2.2. How To Find $\nabla _x f(x) = 0$: Iterative Approach¶
Iterative methods
- More commonly the condition that the gradient equal zero will not have an analytical solution, require iterative methods
- The gradient points in the direction of "steepest ascent" for function $f$
3. Gradient Descent¶
- It motivates the gradient descent algorithm, which repeatedly takes steps in the direction of the negative gradient
$$ x \leftarrow x - \alpha \nabla _x f(x) \quad \quad \text{for some step size } \alpha > 0$$
- Gradient Descent
$$\text{Repeat : } x \leftarrow x - \alpha \nabla _x f(x) \quad \quad \text{for some step size } \alpha > 0$$
- Gradient Descent in Higher Dimension
$$\text{Repeat : } x \leftarrow x - \alpha \nabla _x f(x)$$
Example
$$ \begin{align*} \min& \quad (x_1-3)^{2} + (x_2-3)^{2}\\\\ =\min& \quad \frac{1}{2} \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} - \begin{bmatrix} 6 & 6 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + 18 \end{align*} $$
\begin{align*} f &= \frac{1}{2}X^THX + g^TX \\ \nabla f &= HX+g \end{align*}
- update rule
$$ X_{i+1} = X_{i} - \alpha_i \nabla f(X_i)$$
import numpy as np
H = np.matrix([[2, 0],[0, 2]])
g = -np.matrix([[6],[6]])
x = np.zeros((2,1))
alpha = 0.2
for i in range(25):
df = H*x + g
x = x - alpha*df
print(x)
3.2. Where will We Converge?¶
$\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \qquad \qquad \bullet \, \text{Random initialization}$
$\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \qquad \qquad \bullet \, \text{Multiple trials}$
4. Practically Solving Optimization Problems¶
The good news: for many classes of optimization problems, people have already done all the “hard work” of developing numerical algorithms
- A wide range of tools that can take optimization problems in “natural” forms and compute a solution
Gradient descent
- Easy to implement
- Very general, can be applied to any differentiable loss functions
- Requires less memory and computations (for stochastic methods)
- Neural networks/deep learning
- TensorFlow
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')