Optimization
Table of Contents
from IPython.display import YouTubeVideo
YouTubeVideo('iiKKPAF1inU', width="560", height="315", frameborder="0")
Optimization is a mathematical discipline that focuses on finding the best solution to a problem within a defined set of constraints. It involves maximizing or minimizing an objective function, which represents the goal of the optimization process, such as minimizing costs, maximizing efficiency, or achieving the best performance in a system.
(source)
3 key components
Procedures
In mathematical expression
$\quad$where
$x=\begin{bmatrix}x_1 \\ \vdots \\ x_n\end{bmatrix} \in \mathbb{R}^n$ is the decision variable
$f: \mathbb{R}^n \rightarrow \mathbb{R}$ is an objective function
Feasible region: $\mathcal{C} = \{x: g_i(x) \leq 0, \quad i=1, \cdots,m\}$
Remarks) equivalent
from IPython.display import YouTubeVideo
YouTubeVideo('CqYJhPOFPGk', width="560", height="315", frameborder="0")
Generalization for multivariate function $f:\mathbb{R}^n \rightarrow \ \mathbb{R}$
Gradient descent is an optimization algorithm widely used in machine learning and deep learning to minimize a loss function by iteratively updating model parameters. The core idea is to adjust the parameters in the direction of the negative gradient of the loss function with respect to the parameters, aiming to find the point where the loss function reaches its minimum value.
Example
import numpy as np
H = np.matrix([[2, 0],[0, 2]])
g = -np.matrix([[6],[6]])
x = np.zeros((2,1))
alpha = 0.2
for i in range(25):
df = H*x + g
x = x - alpha*df
print(x)
Learning rate $\alpha$
A hyperparameter that controls the step size during parameter updates.
A learning rate that is too large may cause the algorithm to diverge, while a learning rate that is too small may slow down convergence.
In machine learning and deep learning, many loss functions are non-convex, meaning they have multiple local minima, saddle points, and complex surfaces. Applying gradient descent to non-convex optimization presents unique challenges but remains effective when guided by careful implementation and enhancements.
For non-convex optimization problems, conducting multiple trials with random initializations is recommended.
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')