Optimization

Table of Contents



1. Optimization

In [ ]:
from IPython.display import YouTubeVideo
YouTubeVideo('iiKKPAF1inU', width = "560", height = "315")
Out[ ]:

Optimization is a mathematical discipline that focuses on finding the best solution to a problem within a defined set of constraints. It involves maximizing or minimizing an objective function, which represents the goal of the optimization process, such as minimizing costs, maximizing efficiency, or achieving the best performance in a system.



Lifeguard Problem (Image source)

As a lifeguard, you spot a swimmer in distress, calling for help offshore. Your goal is to reach the swimmer as quickly as possible. You can:

  • Run on the beach at a speed $v_r$.
  • Swim in the water at a slower speed $v_s$ (since swimming is typically slower than running).

The challenge is to determine the optimal path that minimizes the total rescue time.

Your solution would be

  • If $v_r > v_s$, the lifeguard should run further along the beach before entering the water.

  • If $v_r = v_s$, the shortest path (a straight line) is optimal.

  • If $v_s$ is much slower, the lifeguard should cover more distance on land before diving in.



1.1. Key Components of an Optimization Problem

Every optimization problem consists of the following three essential components:

(1) Objective Function

  • The objective function defines the goal of the optimization problem, which could be either maximization (e.g., profit, efficiency) or minimization (e.g., cost, error).

  • Mathematically, an objective function is represented as:
    $$\min/\max \; f(x)$$
    where $f(x)$ is a function of the decision variables.


(2) Decision Variables (Unknowns)

  • Decision variables are the unknowns that need to be determined to optimize the objective function.

  • These variables define the solution space and can be continuous or discrete, depending on the nature of the problem.


(3) Constraints

  • Constraints define the limitations or restrictions that the decision variables must satisfy.

  • These can be equality constraints (e.g., physical laws, conservation equations) or inequality constraints (e.g., resource limits, capacity restrictions).

  • These constraints ensure that the solution remains feasible within the defined conditions.


1.2. Procedures in Optimization

(1) Modeling

  • The process of defining the objective function, decision variables, and constraints for a given problem is known as modeling.

  • This step translates a real-world problem into a mathematical formulation that can be analyzed and solved.


(2) Solving the Optimization Problem

  • Once the model has been formulated, optimization algorithms are applied to find the optimal solution.

1.3. Mathematical Expression


The general form of an optimization problem can be expressed as:


$$\begin{align*} \min_{x} \quad &f(x) \\ \text{subject to} \quad &g_i(x) \leq 0, \qquad i=1,\cdots,m \end{align*} $$


where

  • $x=\begin{bmatrix}x_1 \\ \vdots \\ x_n\end{bmatrix} \in \mathbb{R}^n$ is the decision variable

  • $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is an objective function

  • Feasible region: $\mathcal{C} = \{x: g_i(x) \leq 0, \quad i=1, \cdots,m\}$



Remarks) Equivalent Formulations

The optimization problem can be equivalently transformed using the following relationships:


$$\begin{align*} \min_{x} f(x) \quad&\longleftrightarrow \quad \max_{x} -f(x)\\ \quad g_i(x) \leq 0\quad&\longleftrightarrow \quad -g_i(x) \geq 0\\ h(x) = 0 \quad&\longleftrightarrow \quad \begin{cases} h(x) \leq 0 \quad \text{and} \\ h(x) \geq 0 \end{cases} \end{align*} $$


These equivalences allow flexibility in reformulating optimization problems based on the problem structure and solution methods. In particular:

  • A maximization problem can always be converted into a minimization problem by negating the objective function.

  • Inequalities can be rewritten in an equivalent form by flipping the inequality sign.

  • Equality constraints can be rewritten as two separate inequality constraints.


The good news:

For many classes of optimization problems, extensive research has already been conducted, and efficient numerical algorithms have been developed to solve them. This means that:

  • Advanced Optimization Methods Exist: Researchers have designed robust algorithms tailored for different types of optimization problems, including linear, nonlinear, convex, and combinatorial optimization.

  • Automation of Problem Solving: Many software tools and libraries can process optimization problems in a "natural" mathematical form and compute solutions efficiently.

  • Broad Applicability: These tools are widely used in engineering, economics, machine learning, finance, and other disciplines, making optimization accessible for real-world applications.

As a result, practitioners do not need to develop optimization algorithms from scratch but can leverage existing solver packages. The availability of these tools significantly reduces the complexity of implementing optimization solutions, allowing users to focus on modeling rather than algorithmic details.


2. Convex Optimization


Convex vs. Nonconvex Problems

  • Convex optimizaiton is an extremely powerful subset of all optimization problems




Definition of convex optimization


$$\begin{align*} \min_{x} \quad &f(x) \\ \text{subject to} \quad & x \in \mathcal{C} \end{align*} $$


where


  • $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is a convex function and

  • Feasible region $\mathcal{C}$ is a convex set


Remarks)

  • Key property of convex optimization:

    • Any local minimum is also a global minimum
  • We will use CVXPY as a convex optimization solver


In the definition of convex optimization, two fundamental concepts are used:

  • Convex Functions
  • Convex Sets

Before formally defining these concepts, let’s first examine the idea of linear interpolation, as it serves as a foundation for understanding convexity.


Linear Interpolation between Two Points


Given two points $\vec x$ and $\vec y$, the interpolated value at any point between them is obtained using a convex combination of the two points.



$$ \begin{align*} \vec{z} &= \vec{y} + \theta (\vec{x} - \vec{y}) = \theta \vec{x} + (1-\theta) \vec{y},\qquad 0 \leq \theta \leq 1\\ \\ \text{or }\;\; \vec{z} &= \alpha \vec{x} + \beta \vec{y}, \qquad \alpha + \beta = 1 \;\;\text{and}\; 0 \leq \alpha, \beta \end{align*} $$


This simple idea of convex combinations naturally leads to the definitions of convex functions and convex sets.


Convex Function



  • For any $x,y \in \mathbb{R}^n$ and $\theta \in [0,1]$

$$f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta)f(y)$$


  • This condition ensures that the function lies below the line segment connecting any two points on its graph, meaning it has a "bowl-shaped" or "U-shaped" curvature.

Convex Set



  • For a $x,y \in \mathcal{C}$ and $\theta \in [0,1]$

$$\theta x + (1-\theta)y \in \mathcal{C}$$


  • If, for any two points $x_1, x_2 \in \mathcal{C}$, the line segment connecting them also lies within $\mathcal{C}$

  • A convex set does not have "dents" or "holes"


3. Solving Optimization Problems

In [ ]:
from IPython.display import YouTubeVideo
YouTubeVideo('CqYJhPOFPGk', width = "560", height = "315")
Out[ ]:

  • To build intuition, we first focus on the unconstrained, one-dimensional case, where we aim to find the minimum of a function $f(x)$ without constraints.



  • To find minimum point $x^*$, we can look at the derivave of the function, $f'(x)$
    • Any location where $f'(x)$ = 0 will be a "flat" point in the function
    • For convex problems, this is guaranteed to be a minimum

  • Generalization for multivariate function $f:\mathbb{R}^n \rightarrow \ \mathbb{R}$

    • The gradient of $f$ must be zero

$$ \nabla _x f(x^*) = 0$$


  • Gradient is a n-dimensional vector containing partial derivatives with respect to each dimension

$$ x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \quad \quad \quad \quad \nabla _x f(x) = \begin{bmatrix} \partial f(x) \over \partial x_1 \\ \vdots\\ \partial f(x) \over \partial x_n \end{bmatrix} $$


  • Each component $\partial f \over \partial x_i$ represents the rate of change of $f(x)$ with respect to $x_i$, holding all other variables constant.

  • For continuously differentiable $f$ and unconstrained optimization, optimal point must have $\nabla _x f(x^*)=0$
    • This means that at $x^*$, the function has zero slope in all directions

3.1. How to Find $\nabla _x f(x) = 0$: Analytic Approach


  • Direct solution
    • In some cases, it is possible to analytically compute $x^*$ such that $ \nabla _x f(x^*)=0$

  • For example,

$$ \begin{align*} f(x) &= 2x_1^2+ x_2^2 + x_1 x_2 -6 x_1 -5 x_2\\\\ \Longrightarrow \nabla _x f(x) &= \begin{bmatrix} 4x_1+x_2-6\\ 2x_2 + x_1 -5 \end{bmatrix} = \begin{bmatrix}0\\0 \end{bmatrix}\\\\ \therefore x^* &= \begin{bmatrix} 4 & 1\\ 1 & 2 \end{bmatrix} ^{-1} \begin{bmatrix} 6 \\ 5\\ \end{bmatrix} = \begin{bmatrix} 1 \\ 2\\ \end{bmatrix} \end{align*} $$


  • In matrix calculus, differentiation follows principles similar to scalar calculus. However, due to the additional structure introduced by matrix and vector dimensions, special care must be taken when computing matrix derivatives. Below are fundamental differentiation rules commonly used.

Exampels


  • Affine function $g(x) = a^Tx + b$

$$\nabla g(x) = a$$


  • Quadratic function $g(x) = x^T P x + q^T x + r,\qquad P = P^T$

$$\nabla g(x) = 2Px + q$$


  • $g(x) = \lVert Ax - b \rVert ^2 = x^TA^TAx - 2b^TAx + b^Tb$

$$\nabla g(x) = 2A^TAx-2A^Tb$$



Note: Revisit Least-Square Solution of $J(x) = \lVert Ax - y \rVert ^2$


$$ \begin{align*} J(x) &= (Ax-y)^T(Ax-y)\\ &=(x^TA^T - y^T)(Ax - y)\\ &=x^TA^TAx - x^TA^Ty - y^TAx + y^Ty\\\\ \frac{\partial J}{\partial x} &= A^TAx + (A^TA)^Tx - A^Ty - (y^TA)^T \\ &=A^TAx - 2A^Ty = 0\\\\ &\Rightarrow (A^TA)x = A^Ty\\\\ \therefore x^* &= (A^TA)^{-1}A^Ty \end{align*} $$



3.2. How to Find $\nabla _x f(x) = 0$: Iterative Approach


  • Iterative methods
    • More commonly, the condition where the gradient equal zero does not have an analytical solution, require iterative methods

  • The gradient points in the direction of "steepest ascent" for function $f$






3.2.1. Gradient Descent

  • It motivates the gradient descent algorithm, which repeatedly takes steps in the direction of the negative gradient

$$ x \leftarrow x - \alpha \nabla _x f(x) \quad \quad \text{for some step size } \alpha > 0$$




  • Gradient Descent

$$\color{red}{\text{Repeat : }} x \leftarrow x - \alpha \nabla _x f(x) \quad \quad \text{for some step size } \alpha > 0$$





  • Stopping Criteria for Gradient Descent
    • when the gradient is sufficiently small
    • if the change in function value between iterations is very small
    • if the update step is very small
    • when the number of iterations hit the hard limit

  • Gradient Descent in Higher Dimension
    • the following visual illustration in 2D can provide an intuitive understanding of the gradient descent algorithm in higher dimension






3.2.2. Choosing Step Size


Learning rate $\alpha$

  • A hyperparameter that controls the step size during parameter updates.

  • A learning rate that is too large may cause the algorithm to diverge, while a learning rate that is too small may slow down convergence.





3.2.3 Where will We Converge?




  • In machine learning and deep learning, many loss functions are non-convex, meaning they have multiple local minima, saddle points, and complex surfaces. Applying gradient descent to non-convex optimization presents unique challenges but remains effective when guided by careful implementation and enhancements.

  • For non-convex optimization problems, conducting multiple trials with random initializations is recommended.


3.2.4. Example


$$ \begin{align*} \min& \quad (x_1-3)^{2} + (x_2-3)^{2}\\\\ =\min& \quad \frac{1}{2} \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} - \begin{bmatrix} 6 & 6 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + 18 \end{align*} $$



\begin{align*} f &= \frac{1}{2}X^THX + g^TX \\ \nabla f &= HX+g \end{align*}


  • Update rule

$$ X_{i+1} = X_{i} - \alpha_i \nabla f(X_i)$$

In [ ]:
import numpy as np

H = np.matrix([[2, 0],[0, 2]])
g = -np.matrix([[6],[6]])

x =  np.zeros((2,1))
alpha = 0.2

for i in range(25):
    df = H*x + g
    x = x - alpha*df

print(x)
[[2.99999147]
 [2.99999147]]

4. Practically Solving Optimization Problems

In [ ]:
from IPython.display import YouTubeVideo
YouTubeVideo('Qsah8HOQGUw', width = "560", height = "315")
Out[ ]:

  • The good news:
    • For many classes of optimization problems, people have already done all the “hard work” of developing numerical algorithms
    • A wide range of tools that can take optimization problems in “natural” forms and compute a solution

  • Gradient descent
    • Easy to implement
    • Very general, can be applied to any differentiable loss functions
    • Requires less memory and computations (for stochastic methods)
    • Neural networks/deep learning
    • TensorFlow

4.1. Example: Linear Programming (Convex)


  • Linear programming = objective function and constraints are both linear

$$ \begin{align*} \max \; & \; 3x_1 + \frac{3}{2}x_2 \quad \quad \leftarrow \text{objective function}\\ \\ \text{subject to} \; & -1 \leq x_1 \leq 2 \quad \leftarrow \text{constraints}\\ & \quad 0 \leq x_2 \leq 3 \end{align*} $$

Method 1: graphical approach


$$ 3 x_1 + 1.5 x_2 = C \qquad \Rightarrow \qquad x_2 = -2 x_1 + \frac{2}{3}C $$




Method 2: CVXPY-based solver


$$ \begin{array}{Icr}\begin{align*} \max_{x} \quad & 3x_1 + {3 \over 2}x_2 \\ \text{subject to} \quad & -1 \leq x_1 \leq 2 \\ & \quad 0 \leq x_2 \leq 3 \end{align*}\end{array} \quad\implies\quad \begin{array}{I} \quad \min_{x} \quad & - \begin{bmatrix} 3 \\ 3 / 2 \end{bmatrix}^T \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \\ \text{subject to} \quad & \begin{bmatrix} -1 \\ 0 \end{bmatrix} \leq \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \leq \begin{bmatrix} 2 \\ 3 \\ \end{bmatrix} \end{array} $$


In [ ]:
import numpy as np
import cvxpy as cvx

f = np.array([[3], [3/2]])
lb = np.array([[-1], [0]])
ub = np.array([[2], [3]])

x = cvx.Variable([2,1])

obj = cvx.Minimize(-f.T@x)
constraints = [lb <= x, x <= ub]

prob = cvx.Problem(obj, constraints)
result = prob.solve()

print(x.value)
print(result)
[[1.99999999]
 [2.99999999]]
-10.499999966365493

4.2. Example: Quadratic Programming (Convex)


$$ \begin{array}{Icr}\begin{align*} \min \quad &\frac{1}{2}x_1^2+3x_1+4x_2 \\ \text{subject to} \quad & x_1+3x_2 \geq 15 \\ & 2x_1+5x_2 \leq 100 \\ & 3x_1+4x_2 \leq 80 \\ & x_1,x_2 \geq 0 \\ \end{align*}\end{array} \quad\implies\quad \begin{array}{I} \min_{X} \quad & X^T HX + f^T X \\ \text{subject to} \quad & AX \leq b \\ & A_{eq}X = b_{eq} \\ & lb \leq X \leq ub \end{array} $$


In [ ]:
f = np.array([[3], [4]])
H = np.array([[1/2, 0], [0, 0]])

A = np.array([[-1, -3], [2, 5], [3, 4]])
b = np.array([[-15], [100], [80]])
lb = np.array([[0], [0]])

x = cvx.Variable([2,1])

obj = cvx.Minimize(cvx.quad_form(x, H) + f.T@x)
constraints = [A@x <= b, lb <= x]

prob = cvx.Problem(obj, constraints)
result = prob.solve()

print(x.value)
print(result)
[[-1.15591357e-25]
 [ 5.00000000e+00]]
20.000000000000004

4.3. Example: Best Location at a Concert Hall

  • Find the best location to listen to singer's voice





$$\begin{align*} \min \quad & \sqrt{(x_1-3)^{2}+(x_2-3)^{2}} \\ \Rightarrow \quad \min \quad & (x_1-3)^{2} + (x_2-3)^{2} \\ \text{subject to} \quad & x_1 + x_2 \leq 3 \\ & x_1 \geq 0 \\ & x_2 \geq 0 \end{align*} $$


$$\begin{align*} \Rightarrow \quad & x_1^{2} - 6x_1 + 9 + x_2^{2} - 6x_2 + 9 \\ & = x_1^{2} + x_2^{2} - 6 x_1 - 6x_2 + 18 \\ & = \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} - \begin{bmatrix} 6 & 6 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + 18 \\ & \begin{bmatrix} 1 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \leq 3 \\ & \begin{bmatrix} 0 \\ 0 \end{bmatrix} \leq \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \leq \begin{bmatrix} \\ \\ \end{bmatrix} \end{align*} $$


In [ ]:
f = np.array([[-6], [-6]])
H = np.array([[1,0], [0,1]])

A = np.array([1,1])
b = 3
lb = np.array([[0], [0]])

x = cvx.Variable([2,1])

obj = cvx.Minimize(cvx.quad_form(x, H) + f.T@x)
constraints = [A@x <= b, lb <= x]

prob = cvx.Problem(obj, constraints)
result = prob.solve()

print(x.value)
[[1.5]
 [1.5]]

4.4. Example: Shortest Distance




$$ \min \; d_1 + d_2 = \min \left\rVert \vec{a} - \begin{bmatrix}x\\0\end{bmatrix}\right\rVert_2 + \left\rVert \vec{b} - \begin{bmatrix}x\\0\end{bmatrix}\right\rVert_2$$


  • $ \vec{a} \rightarrow \begin{bmatrix}x\\0\end{bmatrix}$: walk with an empty bucket
  • $ \begin{bmatrix}x\\0\end{bmatrix} \rightarrow \vec{b}$: walk with a water-filled bucket

$$\implies \min d_1 + \mu d_2$$


In [ ]:
a = np.array([[0], [1]])
b = np.array([[4], [2]])

Aeq = np.array([0,1])
beq = 0

x = cvx.Variable([2,1])

mu = 1
obj = cvx.Minimize(cvx.norm(a-x, 2) + mu*cvx.norm(b-x, 2))
constraints = [Aeq@x == beq]

prob = cvx.Problem(obj, constraints)
result = prob.solve()

print(x.value)
print(result)
[[1.33325114e+00]
 [5.33304239e-12]]
5.0000000010880985
In [ ]:
print(np.sqrt(4**2 + 3**2))
5.0

4.5. Example: Supply Chain

  • Find a point that minimizes the sum of the transportation costs (or distance) from this point to $n$ destination points


In [ ]:
a = np.array([[np.sqrt(3)], [0]])
b = np.array([[-np.sqrt(3)], [0]])
c = np.array([[0],[3]])

x = cvx.Variable([2,1])

obj = cvx.Minimize(cvx.norm(a-x, 2) + cvx.norm(b-x, 2) + cvx.norm(c-x, 2))
#obj = cvx.Minimize(cvx.norm(a-x, 1) + cvx.norm(b-x, 1) + cvx.norm(c-x, 1))

prob = cvx.Problem(obj)
result = prob.solve()

print(x.value)
[[-1.58674964e-16]
 [ 1.00000001e+00]]
In [ ]:
B = np.hstack([a,b,c])
np.mean(B,1)
Out[ ]:
array([0., 1.])
In [ ]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')