Optimization
Table of Contents
1. Optimization¶
an important tool in 1) engineering problem solving and 2) decision science
peolple optimize
nature optimizes
3 key components
- objective
- decision variable or unknown
- constraints
Procedures
- The process of identifying objective, variables, and constraints for a given problem is known as "modeling"
- Once the model has been formulated, optimization algorithm can be used to find its solutions.
In mathematical expression
$$\begin{align*} \min_{x} \quad &f(x) \\ \text{subject to} \quad &g_i(x) \leq 0, \qquad i=1,\cdots,m \end{align*} $$
$\;\;\; $where
$x=\begin{bmatrix}x_1 \\ \vdots \\ x_n\end{bmatrix} \in \mathbb{R}^n$ is the decision variable
$f: \mathbb{R}^n \rightarrow \mathbb{R}$ is an objective function
Feasible region: $\mathcal{C} = \{x: g_i(x) \leq 0. \quad i=1, \cdots,m\}$
Remarks) equivalent
$$\begin{align*} \min_{x} f(x) \quad&\leftrightarrow \quad \max_{x} -f(x)\\ \quad g_i(x) \leq 0\quad&\leftrightarrow \quad -g_i(x) \geq 0\\ h(x) = 0 \quad&\leftrightarrow \quad \begin{cases} h(x) \leq 0 \quad \text{and} \\ h(x) \geq 0 \end{cases} \end{align*} $$
- The good news: for many classes of optimization problems, people have already done all the "hardwork" of developing numerical algorithms
- A wide range of tools that can take optimization problems in "natural" forms and compute a solution
2. Convex Optimization¶
- Convex optimizaiton is an extremely powerful subset of all optimization problems
$$\begin{align*} \min_{x} \quad &f(x) \\ \text{subject to} \quad & x \in \mathcal{C} \end{align*} $$
$\;\;\; $where
$f: \mathbb{R}^n \rightarrow \mathbb{R}$ is a convex function and
Feasible region $\mathcal{C}$ is a convex set
Remarks)
Key property of convex optimization:
- All local solutions are global solutions
We will use CVX (or CVXPY) as a convex optimization solver
- https://www.cvxpy.org/
- Many examples later
Linear Interpolation between Two Points
$$ \begin{align*} \vec{z} &= \vec{y} + \theta (\vec{x} - \vec{y}) = \theta \vec{x} + (1-\theta) \vec{y},\qquad 0 \leq \theta \leq 1\\ \\ \text{or }\;\; \vec{z} &= \alpha \vec{x} + \beta \vec{y}, \qquad \alpha + \beta = 1 \;\;\text{and}\; 0 \leq \alpha, \beta \end{align*} $$
Convex Function
- For any $x,y \in \mathbb{R}^n$ and $\theta \in [0,1]$
$$f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta)f(y)$$
Convex Set
- For a $x,y \in \mathcal{C}$ and $\theta \in [0,1]$
$$\theta x + (1-\theta)y \in \mathcal{C}$$
3. Solving Optimization Problems¶
- Starting with the unconstrained, one dimensional case
To find minimum point $x^*$, we can look at the derivave of the function $f'(x)$:
- Any location where $f'(x)$ = 0 will be a "flat" point in the function
For convex problems, this is guaranteed to be a minimum
Generalization for multivariate function $f:\mathbb{R}^n \rightarrow \ \mathbb{R}$
- The gradient of $f$ must be zero
$$ \nabla _x f(x) = 0$$
- Gradient is a n-dimensional vector containing partial derivatives with respect to each dimension
$$ x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \quad \quad \quad \quad \nabla _x f(x) = \begin{bmatrix} \partial f(x) \over \partial x_1 \\ \vdots\\ \partial f(x) \over \partial x_n \end{bmatrix} $$
- For continuously differentiable $f$ and unconstrained optimization, optimal point must have $\nabla _x f(x^*)=0$
3.1. How To Find $\nabla _x f(x) = 0$: Analytic Approach¶
- Direct solution
- In some cases, it is possible to analytically compute $x^*$ such that $ \nabla _x f(x^*)=0$
$$ \begin{align*} f(x) &= 2x_1^2+ x_2^2 + x_1 x_2 -6 x_1 -5 x_2\\\\ \Longrightarrow \nabla _x f(x) &= \begin{bmatrix} 4x_1+x_2-6\\ 2x_2 + x_1 -5 \end{bmatrix} = \begin{bmatrix}0\\0 \end{bmatrix}\\\\ \therefore x^* &= \begin{bmatrix} 4 & 1\\ 1 & 2 \end{bmatrix} ^{-1} \begin{bmatrix} 6 \\ 5\\ \end{bmatrix} = \begin{bmatrix} 1 \\ 2\\ \end{bmatrix} \end{align*} $$
- Note: Matrix derivatives
Exampels
- Affine function $g(x) = a^Tx + b$
$$\nabla g(x) = a$$
- Quadratic function $g(x) = x^T P x + q^T x + r,\qquad P = P^T$
$$\nabla g(x) = 2Px + q$$
- $g(x) = \lVert Ax - b \rVert ^2 = x^TA^TAx - 2b^TAx + b^Tb$
$$\nabla g(x) = 2A^TAx-2A^Tb$$
Note: Revisit Least-Square Solution of $J(x) = \lVert Ax - y \rVert ^2$
$$ \begin{align*} J(x) &= (Ax-y)^T(Ax-y)\\ &=(x^TA^T - y^T)(Ax - y)\\ &=x^TA^TAx - x^TA^Ty - y^TAx + y^Ty\\\\ \frac{\partial J}{\partial x} &= A^TAx + (A^TA)^Tx - A^Ty - (y^TA)^T \\ &=A^TAx - 2A^Ty = 0\\\\ &\Rightarrow (A^TA)x = A^Ty\\\\ \therefore x^* &= (A^TA)^{-1}A^Ty \end{align*} $$
3.2. How To Find $\nabla _x f(x) = 0$: Iterative Approach¶
Iterative methods
- More commonly the condition that the gradient equal zero will not have an analytical solution, require iterative methods
- The gradient points in the direction of "steepest ascent" for function $f$
3.2.1. Gradient Descent¶
- It motivates the gradient descent algorithm, which repeatedly takes steps in the direction of the negative gradient
$$ x \leftarrow x - \alpha \nabla _x f(x) \quad \quad \text{for some step size } \alpha > 0$$
- Gradient Descent
$$\text{Repeat : } x \leftarrow x - \alpha \nabla _x f(x) \quad \quad \text{for some step size } \alpha > 0$$
- Gradient Descent in Higher Dimension
$$\text{Repeat : } x \leftarrow x - \alpha \nabla _x f(x)$$
3.2.2. Choosing Step Size¶
- Learning rate
3.2.3 Where will We Converge?¶
$\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \qquad \qquad \bullet \, \text{Random initialization}$
$\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \qquad \qquad \bullet \, \text{Multiple trials}$
3.2.4. Example¶
$$ \begin{align*} \min& \quad (x_1-3)^{2} + (x_2-3)^{2}\\\\ =\min& \quad \frac{1}{2} \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} - \begin{bmatrix} 6 & 6 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + 18 \end{align*} $$
\begin{align*} f &= \frac{1}{2}X^THX + g^TX \\ \nabla f &= HX+g \end{align*}
- update rule
$$ X_{i+1} = X_{i} - \alpha_i \nabla f(X_i)$$
import numpy as np
H = np.matrix([[2, 0],[0, 2]])
g = -np.matrix([[6],[6]])
x = np.zeros((2,1))
alpha = 0.2
for i in range(25):
df = H*x + g
x = x - alpha*df
print(x)
4. Practically Solving Optimization Problems¶
The good news: for many classes of optimization problems, people have already done all the “hard work” of developing numerical algorithms
- A wide range of tools that can take optimization problems in “natural” forms and compute a solution
We will use CVX (or CVXPY) as an optimization solver
- Only for convex problems
- http://cvxr.com/cvx/
- https://www.cvxpy.org/
Gradient descent
- Neural networks/deep learning
4.1. Example: Linear Programming (Convex)¶
- Linear programming = objective function and constraints are both linear
$$ \begin{align*} \max \; & \; 3x_1 + \frac{3}{2}x_2 \quad \quad \leftarrow \text{objective function}\\ \\ \text{subject to} \; & -1 \leq x_1 \leq 2 \quad \leftarrow \text{constraints}\\ & \quad 0 \leq x_2 \leq 3 \end{align*} $$
Method 1: graphical approach
$$ 3 x_1 + 1.5 x_2 = C \qquad \Rightarrow \qquad x_2 = -2 x_1 + \frac{2}{3}C $$
Method 2: CVXPY-based solver
$$ \begin{array}{Icr}\begin{align*} \max_{x} \quad & 3x_1 + {3 \over 2}x_2 \\ \text{subject to} \quad & -1 \leq x_1 \leq 2 \\ & \quad 0 \leq x_2 \leq 3 \end{align*}\end{array} \quad\implies\quad \begin{array}{I} \quad \min_{x} \quad & - \begin{bmatrix} 3 \\ 3 / 2 \end{bmatrix}^T \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \\ \text{subject to} \quad & \begin{bmatrix} -1 \\ 0 \end{bmatrix} \leq \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \leq \begin{bmatrix} 2 \\ 3 \\ \end{bmatrix} \end{array} $$
import numpy as np
import cvxpy as cvx
f = np.array([[3], [3/2]])
lb = np.array([[-1], [0]])
ub = np.array([[2], [3]])
x = cvx.Variable([2,1])
obj = cvx.Minimize(-f.T@x)
constraints = [lb <= x, x <= ub]
prob = cvx.Problem(obj, constraints)
result = prob.solve()
print(x.value)
print(result)
4.2. Example: Quadratic Programming (Convex)¶
$$ \begin{array}{Icr}\begin{align*} \min \quad &\frac{1}{2}x_1^2+3x_1+4x_2 \\ \text{subject to} \quad & x_1+3x_2 \geq 15 \\ & 2x_1+5x_2 \leq 100 \\ & 3x_1+4x_2 \leq 80 \\ & x_1,x_2 \geq 0 \\ \end{align*}\end{array} \quad\implies\quad \begin{array}{I} \min_{X} \quad & X^T HX + f^T X \\ \text{subject to} \quad & AX \leq b \\ & A_{eq}X = b_{eq} \\ & lb \leq X \leq ub \end{array} $$
f = np.array([[3], [4]])
H = np.array([[1/2, 0], [0, 0]])
A = np.array([[-1, -3], [2, 5], [3, 4]])
b = np.array([[-15], [100], [80]])
lb = np.array([[0], [0]])
x = cvx.Variable([2,1])
obj = cvx.Minimize(cvx.quad_form(x, H) + f.T@x)
constraints = [A@x <= b, lb <= x]
prob = cvx.Problem(obj, constraints)
result = prob.solve()
print(x.value)
print(result)
4.3. Example: Best Location at a Concert Hall¶
- Find the best location to listen to singer's voice
$$\begin{align*} \min \quad & \sqrt{(x_1-3)^{2}+(x_2-3)^{2}} \\ \Rightarrow \quad \min \quad & (x_1-3)^{2} + (x_2-3)^{2} \\ \text{subject to} \quad & x_1 + x_2 \leq 3 \\ & x_1 \geq 0 \\ & x_2 \geq 0 \end{align*} $$
$$\begin{align*} \Rightarrow \quad & x_1^{2} - 6x_1 + 9 + x_2^{2} - 6x_2 + 9 \\ & = x_1^{2} + x_2^{2} - 6 x_1 - 6x_2 + 18 \\ & = \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} - \begin{bmatrix} 6 & 6 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + 18 \\ & \begin{bmatrix} 1 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \leq 3 \\ & \begin{bmatrix} 0 \\ 0 \end{bmatrix} \leq \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \leq \begin{bmatrix} \\ \\ \end{bmatrix} \end{align*} $$
f = np.array([[-6], [-6]])
H = np.array([[1,0], [0,1]])
A = np.array([1,1])
b = 3
lb = np.array([[0], [0]])
x = cvx.Variable([2,1])
obj = cvx.Minimize(cvx.quad_form(x, H) + f.T@x)
constraints = [A@x <= b, lb <= x]
prob = cvx.Problem(obj, constraints)
result = prob.solve()
print(x.value)
4.4. Example: Shortest Distance¶
$$ \min \; d_1 + d_2 = \min \left\rVert \vec{a} - \begin{bmatrix}x\\0\end{bmatrix}\right\rVert_2 + \left\rVert \vec{b} - \begin{bmatrix}x\\0\end{bmatrix}\right\rVert_2$$
- $ \vec{a} \rightarrow \begin{bmatrix}x\\0\end{bmatrix}$: walk with an empty bucket
- $ \begin{bmatrix}x\\0\end{bmatrix} \rightarrow \vec{b}$: walk with a water-filled bucket
$$\implies \min d_1 + \mu d_2$$
a = np.array([[0], [1]])
b = np.array([[4], [2]])
Aeq = np.array([0,1])
beq = 0
x = cvx.Variable([2,1])
mu = 1
obj = cvx.Minimize(cvx.norm(a-x, 2) + mu*cvx.norm(b-x, 2))
constraints = [Aeq@x == beq]
prob = cvx.Problem(obj, constraints)
result = prob.solve()
print(x.value)
print(result)
print(np.sqrt(4**2 + 3**2))
4.5. Example: Supply Chain¶
- Find a point that minimizes the sum of the transportation costs (or distance) from this point to $n$ destination points
a = np.array([[np.sqrt(3)], [0]])
b = np.array([[-np.sqrt(3)], [0]])
c = np.array([[0],[3]])
x = cvx.Variable([2,1])
obj = cvx.Minimize(cvx.norm(a-x, 2) + cvx.norm(b-x, 2) + cvx.norm(c-x, 2))
#obj = cvx.Minimize(cvx.norm(a-x, 1) + cvx.norm(b-x, 1) + cvx.norm(c-x, 1))
prob = cvx.Problem(obj)
result = prob.solve()
print(x.value)
B = np.hstack([a,b,c])
np.mean(B,1)
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')