Optimization

Table of Contents

1. Optimization

  • an important tool in 1) engineering problem solving and 2) decision science

  • peolple optimize

  • nature optimizes


3 key components

  1. objective
  2. decision variable or unknown
  3. constraints

Procedures

  1. The process of identifying objective, variables, and constraints for a given problem is known as "modeling"
  2. Once the model has been formulated, optimization algorithm can be used to find its solutions.

In mathematical expression


$$\begin{align*} \min_{x} \quad &f(x) \\ \text{subject to} \quad &g_i(x) \leq 0, \qquad i=1,\cdots,m \end{align*} $$

$\;\;\; $where

  • $x=\begin{bmatrix}x_1 \\ \vdots \\ x_n\end{bmatrix} \in \mathbb{R}^n$ is the decision variable

  • $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is an objective function

  • Feasible region: $\mathcal{C} = \{x: g_i(x) \leq 0. \quad i=1, \cdots,m\}$


Remarks) equivalent


$$\begin{align*} \min_{x} f(x) \quad&\leftrightarrow \quad \max_{x} -f(x)\\ \quad g_i(x) \leq 0\quad&\leftrightarrow \quad -g_i(x) \geq 0\\ h(x) = 0 \quad&\leftrightarrow \quad \begin{cases} h(x) \leq 0 \quad \text{and} \\ h(x) \geq 0 \end{cases} \end{align*} $$


  • The good news: for many classes of optimization problems, people have already done all the "hardwork" of developing numerical algorithms
    • A wide range of tools that can take optimization problems in "natural" forms and compute a solution

2. Convex Optimization



  • Convex optimizaiton is an extremely powerful subset of all optimization problems

$$\begin{align*} \min_{x} \quad &f(x) \\ \text{subject to} \quad & x \in \mathcal{C} \end{align*} $$

$\;\;\; $where

  • $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is a convex function and

  • Feasible region $\mathcal{C}$ is a convex set


Remarks)

  • Key property of convex optimization:

    • All local solutions are global solutions
  • We will use CVX (or CVXPY) as a convex optimization solver

Linear Interpolation between Two Points



$$ \begin{align*} \vec{z} &= \vec{y} + \theta (\vec{x} - \vec{y}) = \theta \vec{x} + (1-\theta) \vec{y},\qquad 0 \leq \theta \leq 1\\ \\ \text{or }\;\; \vec{z} &= \alpha \vec{x} + \beta \vec{y}, \qquad \alpha + \beta = 1 \;\;\text{and}\; 0 \leq \alpha, \beta \end{align*} $$

Convex Function



  • For any $x,y \in \mathbb{R}^n$ and $\theta \in [0,1]$

$$f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta)f(y)$$

Convex Set



  • For a $x,y \in \mathcal{C}$ and $\theta \in [0,1]$

$$\theta x + (1-\theta)y \in \mathcal{C}$$

3. Solving Optimization Problems

  • Starting with the unconstrained, one dimensional case


  • To find minimum point $x^*$, we can look at the derivave of the function $f'(x)$:

    • Any location where $f'(x)$ = 0 will be a "flat" point in the function
  • For convex problems, this is guaranteed to be a minimum

  • Generalization for multivariate function $f:\mathbb{R}^n \rightarrow \ \mathbb{R}$

    • The gradient of $f$ must be zero

$$ \nabla _x f(x) = 0$$


  • Gradient is a n-dimensional vector containing partial derivatives with respect to each dimension

$$ x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \quad \quad \quad \quad \nabla _x f(x) = \begin{bmatrix} \partial f(x) \over \partial x_1 \\ \vdots\\ \partial f(x) \over \partial x_n \end{bmatrix} $$


  • For continuously differentiable $f$ and unconstrained optimization, optimal point must have $\nabla _x f(x^*)=0$

3.1. How To Find $\nabla _x f(x) = 0$: Analytic Approach

  • Direct solution
    • In some cases, it is possible to analytically compute $x^*$ such that $ \nabla _x f(x^*)=0$

$$ \begin{align*} f(x) &= 2x_1^2+ x_2^2 + x_1 x_2 -6 x_1 -5 x_2\\\\ \Longrightarrow \nabla _x f(x) &= \begin{bmatrix} 4x_1+x_2-6\\ 2x_2 + x_1 -5 \end{bmatrix} = \begin{bmatrix}0\\0 \end{bmatrix}\\\\ \therefore x^* &= \begin{bmatrix} 4 & 1\\ 1 & 2 \end{bmatrix} ^{-1} \begin{bmatrix} 6 \\ 5\\ \end{bmatrix} = \begin{bmatrix} 1 \\ 2\\ \end{bmatrix} \end{align*} $$


  • Note: Matrix derivatives

Exampels

  • Affine function $g(x) = a^Tx + b$

$$\nabla g(x) = a$$


  • Quadratic function $g(x) = x^T P x + q^T x + r,\qquad P = P^T$

$$\nabla g(x) = 2Px + q$$


  • $g(x) = \lVert Ax - b \rVert ^2 = x^TA^TAx - 2b^TAx + b^Tb$

$$\nabla g(x) = 2A^TAx-2A^Tb$$



Note: Revisit Least-Square Solution of $J(x) = \lVert Ax - y \rVert ^2$


$$ \begin{align*} J(x) &= (Ax-y)^T(Ax-y)\\ &=(x^TA^T - y^T)(Ax - y)\\ &=x^TA^TAx - x^TA^Ty - y^TAx + y^Ty\\\\ \frac{\partial J}{\partial x} &= A^TAx + (A^TA)^Tx - A^Ty - (y^TA)^T \\ &=A^TAx - 2A^Ty = 0\\\\ &\Rightarrow (A^TA)x = A^Ty\\\\ \therefore x^* &= (A^TA)^{-1}A^Ty \end{align*} $$


3.2. How To Find $\nabla _x f(x) = 0$: Iterative Approach

  • Iterative methods

    • More commonly the condition that the gradient equal zero will not have an analytical solution, require iterative methods





  • The gradient points in the direction of "steepest ascent" for function $f$

3.2.1. Gradient Descent

  • It motivates the gradient descent algorithm, which repeatedly takes steps in the direction of the negative gradient

$$ x \leftarrow x - \alpha \nabla _x f(x) \quad \quad \text{for some step size } \alpha > 0$$



  • Gradient Descent

$$\text{Repeat : } x \leftarrow x - \alpha \nabla _x f(x) \quad \quad \text{for some step size } \alpha > 0$$



  • Gradient Descent in Higher Dimension

$$\text{Repeat : } x \leftarrow x - \alpha \nabla _x f(x)$$






3.2.2. Choosing Step Size

  • Learning rate



3.2.3 Where will We Converge?



$\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \qquad \qquad \bullet \, \text{Random initialization}$

$\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \qquad \qquad \bullet \, \text{Multiple trials}$

3.2.4. Example


$$ \begin{align*} \min& \quad (x_1-3)^{2} + (x_2-3)^{2}\\\\ =\min& \quad \frac{1}{2} \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} - \begin{bmatrix} 6 & 6 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + 18 \end{align*} $$



\begin{align*} f &= \frac{1}{2}X^THX + g^TX \\ \nabla f &= HX+g \end{align*}


  • update rule

$$ X_{i+1} = X_{i} - \alpha_i \nabla f(X_i)$$

In [ ]:
import numpy as np

H = np.matrix([[2, 0],[0, 2]])
g = -np.matrix([[6],[6]])

x =  np.zeros((2,1))
alpha = 0.2

for i in range(25):
    df = H*x + g
    x = x - alpha*df

print(x)
[[2.99999147]
 [2.99999147]]

4. Practically Solving Optimization Problems

  • The good news: for many classes of optimization problems, people have already done all the “hard work” of developing numerical algorithms

    • A wide range of tools that can take optimization problems in “natural” forms and compute a solution
  • We will use CVX (or CVXPY) as an optimization solver

  • Gradient descent

    • Neural networks/deep learning

4.1. Example: Linear Programming (Convex)


  • Linear programming = objective function and constraints are both linear

$$ \begin{align*} \max \; & \; 3x_1 + \frac{3}{2}x_2 \quad \quad \leftarrow \text{objective function}\\ \\ \text{subject to} \; & -1 \leq x_1 \leq 2 \quad \leftarrow \text{constraints}\\ & \quad 0 \leq x_2 \leq 3 \end{align*} $$

Method 1: graphical approach


$$ 3 x_1 + 1.5 x_2 = C \qquad \Rightarrow \qquad x_2 = -2 x_1 + \frac{2}{3}C $$




Method 2: CVXPY-based solver


$$ \begin{array}{Icr}\begin{align*} \max_{x} \quad & 3x_1 + {3 \over 2}x_2 \\ \text{subject to} \quad & -1 \leq x_1 \leq 2 \\ & \quad 0 \leq x_2 \leq 3 \end{align*}\end{array} \quad\implies\quad \begin{array}{I} \quad \min_{x} \quad & - \begin{bmatrix} 3 \\ 3 / 2 \end{bmatrix}^T \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \\ \text{subject to} \quad & \begin{bmatrix} -1 \\ 0 \end{bmatrix} \leq \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \leq \begin{bmatrix} 2 \\ 3 \\ \end{bmatrix} \end{array} $$


In [ ]:
import numpy as np
import cvxpy as cvx

f = np.array([[3], [3/2]])
lb = np.array([[-1], [0]])
ub = np.array([[2], [3]])

x = cvx.Variable([2,1])

obj = cvx.Minimize(-f.T@x)
constraints = [lb <= x, x <= ub]

prob = cvx.Problem(obj, constraints)
result = prob.solve()

print(x.value)
print(result)
[[1.99999999]
 [2.99999999]]
-10.499999966365493

4.2. Example: Quadratic Programming (Convex)


$$ \begin{array}{Icr}\begin{align*} \min \quad &\frac{1}{2}x_1^2+3x_1+4x_2 \\ \text{subject to} \quad & x_1+3x_2 \geq 15 \\ & 2x_1+5x_2 \leq 100 \\ & 3x_1+4x_2 \leq 80 \\ & x_1,x_2 \geq 0 \\ \end{align*}\end{array} \quad\implies\quad \begin{array}{I} \min_{X} \quad & X^T HX + f^T X \\ \text{subject to} \quad & AX \leq b \\ & A_{eq}X = b_{eq} \\ & lb \leq X \leq ub \end{array} $$


In [ ]:
f = np.array([[3], [4]])
H = np.array([[1/2, 0], [0, 0]])

A = np.array([[-1, -3], [2, 5], [3, 4]])
b = np.array([[-15], [100], [80]])
lb = np.array([[0], [0]])

x = cvx.Variable([2,1])

obj = cvx.Minimize(cvx.quad_form(x, H) + f.T@x)
constraints = [A@x <= b, lb <= x]

prob = cvx.Problem(obj, constraints)
result = prob.solve()

print(x.value)
print(result)
[[-1.15591357e-25]
 [ 5.00000000e+00]]
20.000000000000004

4.3. Example: Best Location at a Concert Hall

  • Find the best location to listen to singer's voice





$$\begin{align*} \min \quad & \sqrt{(x_1-3)^{2}+(x_2-3)^{2}} \\ \Rightarrow \quad \min \quad & (x_1-3)^{2} + (x_2-3)^{2} \\ \text{subject to} \quad & x_1 + x_2 \leq 3 \\ & x_1 \geq 0 \\ & x_2 \geq 0 \end{align*} $$


$$\begin{align*} \Rightarrow \quad & x_1^{2} - 6x_1 + 9 + x_2^{2} - 6x_2 + 9 \\ & = x_1^{2} + x_2^{2} - 6 x_1 - 6x_2 + 18 \\ & = \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} - \begin{bmatrix} 6 & 6 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + 18 \\ & \begin{bmatrix} 1 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \leq 3 \\ & \begin{bmatrix} 0 \\ 0 \end{bmatrix} \leq \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \leq \begin{bmatrix} \\ \\ \end{bmatrix} \end{align*} $$


In [ ]:
f = np.array([[-6], [-6]])
H = np.array([[1,0], [0,1]])

A = np.array([1,1])
b = 3
lb = np.array([[0], [0]])

x = cvx.Variable([2,1])

obj = cvx.Minimize(cvx.quad_form(x, H) + f.T@x)
constraints = [A@x <= b, lb <= x]

prob = cvx.Problem(obj, constraints)
result = prob.solve()

print(x.value)
[[1.5]
 [1.5]]

4.4. Example: Shortest Distance




$$ \min \; d_1 + d_2 = \min \left\rVert \vec{a} - \begin{bmatrix}x\\0\end{bmatrix}\right\rVert_2 + \left\rVert \vec{b} - \begin{bmatrix}x\\0\end{bmatrix}\right\rVert_2$$


  • $ \vec{a} \rightarrow \begin{bmatrix}x\\0\end{bmatrix}$: walk with an empty bucket
  • $ \begin{bmatrix}x\\0\end{bmatrix} \rightarrow \vec{b}$: walk with a water-filled bucket

$$\implies \min d_1 + \mu d_2$$


In [ ]:
a = np.array([[0], [1]])
b = np.array([[4], [2]])

Aeq = np.array([0,1])
beq = 0

x = cvx.Variable([2,1])

mu = 1
obj = cvx.Minimize(cvx.norm(a-x, 2) + mu*cvx.norm(b-x, 2))
constraints = [Aeq@x == beq]

prob = cvx.Problem(obj, constraints)
result = prob.solve()

print(x.value)
print(result)
[[1.33325114e+00]
 [5.33304239e-12]]
5.0000000010880985
In [ ]:
print(np.sqrt(4**2 + 3**2))
5.0

4.5. Example: Supply Chain

  • Find a point that minimizes the sum of the transportation costs (or distance) from this point to $n$ destination points


In [ ]:
a = np.array([[np.sqrt(3)], [0]])
b = np.array([[-np.sqrt(3)], [0]])
c = np.array([[0],[3]])

x = cvx.Variable([2,1])

obj = cvx.Minimize(cvx.norm(a-x, 2) + cvx.norm(b-x, 2) + cvx.norm(c-x, 2))
#obj = cvx.Minimize(cvx.norm(a-x, 1) + cvx.norm(b-x, 1) + cvx.norm(c-x, 1))

prob = cvx.Problem(obj)
result = prob.solve()

print(x.value)
[[-1.58674964e-16]
 [ 1.00000001e+00]]
In [ ]:
B = np.hstack([a,b,c])
np.mean(B,1)
Out[ ]:
array([0., 1.])
In [ ]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')