Optimization

Table of Contents

1. Optimization¶

an important tool in 1) engineering problem solving and 2) decision science
peolple optimize
nature optimizes

$\;\;\;$(source: http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do)

3 key components

objective
decision variable or unknown
constraints

Procedures

The process of identifying objective, variables, and constraints for a given problem is known as "modeling"
Once the model has been formulated, optimization algorithm can be used to find its solutions.

In mathematical expression

$$\begin{align*} \min_{x} \quad &f(x) \\ \text{subject to} \quad &g_i(x) \leq 0, \qquad i=1,\cdots,m \end{align*} $$

$\;\;\; $where

$x=\begin{bmatrix}x_1 \\ \vdots \\ x_n\end{bmatrix} \in \mathbb{R}^n$ is the decision variable
$f: \mathbb{R}^n \rightarrow \mathbb{R}$ is an objective function
Feasible region: $\mathcal{C} = \{x: g_i(x) \leq 0. \quad i=1, \cdots,m\}$

Remarks) equivalent

$$\begin{align*} \min_{x} f(x) \quad&\leftrightarrow \quad \max_{x} -f(x)\\ \quad g_i(x) \leq 0\quad&\leftrightarrow \quad -g_i(x) \geq 0\\ h(x) = 0 \quad&\leftrightarrow \quad \begin{cases} h(x) \leq 0 \quad \text{and} \\ h(x) \geq 0 \end{cases} \end{align*} $$

The good news: for many classes of optimization problems, people have already done all the "hardwork" of developing numerical algorithms
- A wide range of tools that can take optimization problems in "natural" forms and compute a solution

2. Convex Optimization¶

Convex optimizaiton is an extremely powerful subset of all optimization problems

$$\begin{align*} \min_{x} \quad &f(x) \\ \text{subject to} \quad & x \in \mathcal{C} \end{align*} $$

$\;\;\; $where

$f: \mathbb{R}^n \rightarrow \mathbb{R}$ is a convex function and
Feasible region $\mathcal{C}$ is a convex set

Remarks)

Key property of convex optimization:
- All local solutions are global solutions
We will use CVX (or CVXPY) as a convex optimization solver
- https://www.cvxpy.org/
- Many examples later

Linear Interpolation between Two Points

$$ \begin{align*} \vec{z} &= \vec{y} + \theta (\vec{x} - \vec{y}) = \theta \vec{x} + (1-\theta) \vec{y},\qquad 0 \leq \theta \leq 1\\ \\ \text{or }\;\; \vec{z} &= \alpha \vec{x} + \beta \vec{y}, \qquad \alpha + \beta = 1 \;\;\text{and}\; 0 \leq \alpha, \beta \end{align*} $$

Convex Function

For any $x,y \in \mathbb{R}^n$ and $\theta \in [0,1]$

$$f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta)f(y)$$

Convex Set

For a $x,y \in \mathcal{C}$ and $\theta \in [0,1]$

$$\theta x + (1-\theta)y \in \mathcal{C}$$

3. Solving Optimization Problems¶

Starting with the unconstrained, one dimensional case

To find minimum point $x^*$, we can look at the derivave of the function $f'(x)$:
- Any location where $f'(x)$ = 0 will be a "flat" point in the function
For convex problems, this is guaranteed to be a minimum
Generalization for multivariate function $f:\mathbb{R}^n \rightarrow \ \mathbb{R}$
- The gradient of $f$ must be zero

$$ \nabla _x f(x) = 0$$

Gradient is a n-dimensional vector containing partial derivatives with respect to each dimension

$$ x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \quad \quad \quad \quad \nabla _x f(x) = \begin{bmatrix} \partial f(x) \over \partial x_1 \\ \vdots\\ \partial f(x) \over \partial x_n \end{bmatrix} $$

For continuously differentiable $f$ and unconstrained optimization, optimal point must have $\nabla _x f(x^*)=0$

3.1. How To Find $\nabla _x f(x) = 0$: Analytic Approach¶

Direct solution
- In some cases, it is possible to analytically compute $x^*$ such that $ \nabla _x f(x^*)=0$

$$ \begin{align*} f(x) &= 2x_1^2+ x_2^2 + x_1 x_2 -6 x_1 -5 x_2\\\\ \Longrightarrow \nabla _x f(x) &= \begin{bmatrix} 4x_1+x_2-6\\ 2x_2 + x_1 -5 \end{bmatrix} = \begin{bmatrix}0\\0 \end{bmatrix}\\\\ \therefore x^* &= \begin{bmatrix} 4 & 1\\ 1 & 2 \end{bmatrix} ^{-1} \begin{bmatrix} 6 \\ 5\\ \end{bmatrix} = \begin{bmatrix} 1 \\ 2\\ \end{bmatrix} \end{align*} $$

Note: Matrix derivatives

Exampels

Affine function $g(x) = a^Tx + b$

$$\nabla g(x) = a$$

Quadratic function $g(x) = x^T P x + q^T x + r,\qquad P = P^T$

$$\nabla g(x) = 2Px + q$$

$g(x) = \lVert Ax - b \rVert ^2 = x^TA^TAx - 2b^TAx + b^Tb$

$$\nabla g(x) = 2A^TAx-2A^Tb$$

Note: Revisit Least-Square Solution of $J(x) = \lVert Ax - y \rVert ^2$

$$ \begin{align*} J(x) &= (Ax-y)^T(Ax-y)\\ &=(x^TA^T - y^T)(Ax - y)\\ &=x^TA^TAx - x^TA^Ty - y^TAx + y^Ty\\\\ \frac{\partial J}{\partial x} &= A^TAx + (A^TA)^Tx - A^Ty - (y^TA)^T \\ &=A^TAx - 2A^Ty = 0\\\\ &\Rightarrow (A^TA)x = A^Ty\\\\ \therefore x^* &= (A^TA)^{-1}A^Ty \end{align*} $$

3.2. How To Find $\nabla _x f(x) = 0$: Iterative Approach¶

Iterative methods
- More commonly the condition that the gradient equal zero will not have an analytical solution, require iterative methods

The gradient points in the direction of "steepest ascent" for function $f$

3.2.1. Gradient Descent¶

It motivates the gradient descent algorithm, which repeatedly takes steps in the direction of the negative gradient

$$ x \leftarrow x - \alpha \nabla _x f(x) \quad \quad \text{for some step size } \alpha > 0$$

Gradient Descent

$$\text{Repeat : } x \leftarrow x - \alpha \nabla _x f(x) \quad \quad \text{for some step size } \alpha > 0$$

Gradient Descent in Higher Dimension

$$\text{Repeat : } x \leftarrow x - \alpha \nabla _x f(x)$$

3.2.2. Choosing Step Size¶

Learning rate

3.2.3 Where will We Converge?¶

$\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \qquad \qquad \bullet \, \text{Random initialization}$

$\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \qquad \qquad \bullet \, \text{Multiple trials}$

3.2.4. Example¶

$$ \begin{align*} \min& \quad (x_1-3)^{2} + (x_2-3)^{2}\\\\ =\min& \quad \frac{1}{2} \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} - \begin{bmatrix} 6 & 6 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + 18 \end{align*} $$

\begin{align*} f &= \frac{1}{2}X^THX + g^TX \\ \nabla f &= HX+g \end{align*}

update rule

$$ X_{i+1} = X_{i} - \alpha_i \nabla f(X_i)$$

import numpy as np

H = np.matrix([[2, 0],[0, 2]])
g = -np.matrix([[6],[6]])

x =  np.zeros((2,1))
alpha = 0.2

for i in range(25):
    df = H*x + g
    x = x - alpha*df

print(x)

[[2.99999147]
 [2.99999147]]

4. Practically Solving Optimization Problems¶

The good news: for many classes of optimization problems, people have already done all the “hard work” of developing numerical algorithms
- A wide range of tools that can take optimization problems in “natural” forms and compute a solution
We will use CVX (or CVXPY) as an optimization solver
- Only for convex problems
- http://cvxr.com/cvx/
- https://www.cvxpy.org/
Gradient descent
- Neural networks/deep learning

4.1. Example: Linear Programming (Convex)¶

Linear programming = objective function and constraints are both linear

$$ \begin{align*} \max \; & \; 3x_1 + \frac{3}{2}x_2 \quad \quad \leftarrow \text{objective function}\\ \\ \text{subject to} \; & -1 \leq x_1 \leq 2 \quad \leftarrow \text{constraints}\\ & \quad 0 \leq x_2 \leq 3 \end{align*} $$

Method 1: graphical approach

$$ 3 x_1 + 1.5 x_2 = C \qquad \Rightarrow \qquad x_2 = -2 x_1 + \frac{2}{3}C $$

Method 2: CVXPY-based solver

$$ \begin{array}{Icr}\begin{align*} \max_{x} \quad & 3x_1 + {3 \over 2}x_2 \\ \text{subject to} \quad & -1 \leq x_1 \leq 2 \\ & \quad 0 \leq x_2 \leq 3 \end{align*}\end{array} \quad\implies\quad \begin{array}{I} \quad \min_{x} \quad & - \begin{bmatrix} 3 \\ 3 / 2 \end{bmatrix}^T \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \\ \text{subject to} \quad & \begin{bmatrix} -1 \\ 0 \end{bmatrix} \leq \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \leq \begin{bmatrix} 2 \\ 3 \\ \end{bmatrix} \end{array} $$

import numpy as np
import cvxpy as cvx

f = np.array([[3], [3/2]])
lb = np.array([[-1], [0]])
ub = np.array([[2], [3]])

x = cvx.Variable([2,1])

obj = cvx.Minimize(-f.T@x)
constraints = [lb <= x, x <= ub]

prob = cvx.Problem(obj, constraints)
result = prob.solve()

print(x.value)
print(result)

[[1.99999999]
 [2.99999999]]
-10.499999966365493

4.2. Example: Quadratic Programming (Convex)¶

$$ \begin{array}{Icr}\begin{align*} \min \quad &\frac{1}{2}x_1^2+3x_1+4x_2 \\ \text{subject to} \quad & x_1+3x_2 \geq 15 \\ & 2x_1+5x_2 \leq 100 \\ & 3x_1+4x_2 \leq 80 \\ & x_1,x_2 \geq 0 \\ \end{align*}\end{array} \quad\implies\quad \begin{array}{I} \min_{X} \quad & X^T HX + f^T X \\ \text{subject to} \quad & AX \leq b \\ & A_{eq}X = b_{eq} \\ & lb \leq X \leq ub \end{array} $$

f = np.array([[3], [4]])
H = np.array([[1/2, 0], [0, 0]])

A = np.array([[-1, -3], [2, 5], [3, 4]])
b = np.array([[-15], [100], [80]])
lb = np.array([[0], [0]])

x = cvx.Variable([2,1])

obj = cvx.Minimize(cvx.quad_form(x, H) + f.T@x)
constraints = [A@x <= b, lb <= x]

prob = cvx.Problem(obj, constraints)
result = prob.solve()

print(x.value)
print(result)

[[-1.15591357e-25]
 [ 5.00000000e+00]]
20.000000000000004

4.3. Example: Best Location at a Concert Hall¶

Find the best location to listen to singer's voice

$$\begin{align*} \min \quad & \sqrt{(x_1-3)^{2}+(x_2-3)^{2}} \\ \Rightarrow \quad \min \quad & (x_1-3)^{2} + (x_2-3)^{2} \\ \text{subject to} \quad & x_1 + x_2 \leq 3 \\ & x_1 \geq 0 \\ & x_2 \geq 0 \end{align*} $$

$$\begin{align*} \Rightarrow \quad & x_1^{2} - 6x_1 + 9 + x_2^{2} - 6x_2 + 9 \\ & = x_1^{2} + x_2^{2} - 6 x_1 - 6x_2 + 18 \\ & = \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} - \begin{bmatrix} 6 & 6 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + 18 \\ & \begin{bmatrix} 1 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \leq 3 \\ & \begin{bmatrix} 0 \\ 0 \end{bmatrix} \leq \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \leq \begin{bmatrix} \\ \\ \end{bmatrix} \end{align*} $$

f = np.array([[-6], [-6]])
H = np.array([[1,0], [0,1]])

A = np.array([1,1])
b = 3
lb = np.array([[0], [0]])

x = cvx.Variable([2,1])

obj = cvx.Minimize(cvx.quad_form(x, H) + f.T@x)
constraints = [A@x <= b, lb <= x]

prob = cvx.Problem(obj, constraints)
result = prob.solve()

print(x.value)

[[1.5]
 [1.5]]

4.4. Example: Shortest Distance¶

$$ \min \; d_1 + d_2 = \min \left\rVert \vec{a} - \begin{bmatrix}x\\0\end{bmatrix}\right\rVert_2 + \left\rVert \vec{b} - \begin{bmatrix}x\\0\end{bmatrix}\right\rVert_2$$

$ \vec{a} \rightarrow \begin{bmatrix}x\\0\end{bmatrix}$: walk with an empty bucket
$ \begin{bmatrix}x\\0\end{bmatrix} \rightarrow \vec{b}$: walk with a water-filled bucket

$$\implies \min d_1 + \mu d_2$$

a = np.array([[0], [1]])
b = np.array([[4], [2]])

Aeq = np.array([0,1])
beq = 0

x = cvx.Variable([2,1])

mu = 1
obj = cvx.Minimize(cvx.norm(a-x, 2) + mu*cvx.norm(b-x, 2))
constraints = [Aeq@x == beq]

prob = cvx.Problem(obj, constraints)
result = prob.solve()

print(x.value)
print(result)

[[1.33325114e+00]
 [5.33304239e-12]]
5.0000000010880985

print(np.sqrt(4**2 + 3**2))

5.0

4.5. Example: Supply Chain¶

Find a point that minimizes the sum of the transportation costs (or distance) from this point to $n$ destination points

a = np.array([[np.sqrt(3)], [0]])
b = np.array([[-np.sqrt(3)], [0]])
c = np.array([[0],[3]])

x = cvx.Variable([2,1])

obj = cvx.Minimize(cvx.norm(a-x, 2) + cvx.norm(b-x, 2) + cvx.norm(c-x, 2))
#obj = cvx.Minimize(cvx.norm(a-x, 1) + cvx.norm(b-x, 1) + cvx.norm(c-x, 1))

prob = cvx.Problem(obj)
result = prob.solve()

print(x.value)

[[-1.58674964e-16]
 [ 1.00000001e+00]]

B = np.hstack([a,b,c])
np.mean(B,1)

array([0., 1.])

%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')