Optimization
Table of Contents
from IPython.display import YouTubeVideo
YouTubeVideo('iiKKPAF1inU', width = "560", height = "315")
Optimization is a mathematical discipline that focuses on finding the best solution to a problem within a defined set of constraints. It involves maximizing or minimizing an objective function, which represents the goal of the optimization process, such as minimizing costs, maximizing efficiency, or achieving the best performance in a system.
Lifeguard Problem (Image source)
As a lifeguard, you spot a swimmer in distress, calling for help offshore. Your goal is to reach the swimmer as quickly as possible. You can:
- Run on the beach at a speed $v_r$.
- Swim in the water at a slower speed $v_s$ (since swimming is typically slower than running).
The challenge is to determine the optimal path that minimizes the total rescue time.
Your solution would be
If $v_r > v_s$, the lifeguard should run further along the beach before entering the water.
If $v_r = v_s$, the shortest path (a straight line) is optimal.
If $v_s$ is much slower, the lifeguard should cover more distance on land before diving in.
1.1. Key Components of an Optimization Problem¶
Every optimization problem consists of the following three essential components:
(1) Objective Function
The objective function defines the goal of the optimization problem, which could be either maximization (e.g., profit, efficiency) or minimization (e.g., cost, error).
Mathematically, an objective function is represented as:
$$\min/\max \; f(x)$$
$\qquad $where $f(x)$ is a function of the decision variables.
(2) Decision Variables (Unknowns)
Decision variables are the unknowns that need to be determined to optimize the objective function.
These variables define the solution space and can be continuous or discrete, depending on the nature of the problem.
(3) Constraints
Constraints define the limitations or restrictions that the decision variables must satisfy.
These can be equality constraints (e.g., physical laws, conservation equations) or inequality constraints (e.g., resource limits, capacity restrictions).
These constraints ensure that the solution remains feasible within the defined conditions.
1.2. Procedures in Optimization¶
(1) Modeling
The process of defining the objective function, decision variables, and constraints for a given problem is known as modeling.
This step translates a real-world problem into a mathematical formulation that can be analyzed and solved.
(2) Solving the Optimization Problem
- Once the model has been formulated, optimization algorithms are applied to find the optimal solution.
1.3. Mathematical Expression¶
The general form of an optimization problem can be expressed as:
$$\begin{align*} \min_{x} \quad &f(x) \\ \text{subject to} \quad &g_i(x) \leq 0, \qquad i=1,\cdots,m \end{align*} $$
where
- $x=\begin{bmatrix}x_1 \\ \vdots \\ x_n\end{bmatrix} \in \mathbb{R}^n$ is the decision variable
- $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is an objective function
- Feasible region: $\mathcal{C} = \{x: g_i(x) \leq 0, \quad i=1, \cdots,m\}$
Remarks) Equivalent Formulations
The optimization problem can be equivalently transformed using the following relationships:
$$\begin{align*} \min_{x} f(x) \quad&\longleftrightarrow \quad \max_{x} -f(x)\\ \quad g_i(x) \leq 0\quad&\longleftrightarrow \quad -g_i(x) \geq 0\\ h(x) = 0 \quad&\longleftrightarrow \quad \begin{cases} h(x) \leq 0 \quad \text{and} \\ h(x) \geq 0 \end{cases} \end{align*} $$
These equivalences allow flexibility in reformulating optimization problems based on the problem structure and solution methods. In particular:
A maximization problem can always be converted into a minimization problem by negating the objective function.
Inequalities can be rewritten in an equivalent form by flipping the inequality sign.
Equality constraints can be rewritten as two separate inequality constraints.
The good news:
For many classes of optimization problems, extensive research has already been conducted, and efficient numerical algorithms have been developed to solve them. This means that:
Advanced Optimization Methods Exist: Researchers have designed robust algorithms tailored for different types of optimization problems, including linear, nonlinear, convex, and combinatorial optimization.
Automation of Problem Solving: Many software tools and libraries can process optimization problems in a "natural" mathematical form and compute solutions efficiently.
Broad Applicability: These tools are widely used in engineering, economics, machine learning, finance, and other disciplines, making optimization accessible for real-world applications.
As a result, practitioners do not need to develop optimization algorithms from scratch but can leverage existing solver packages. The availability of these tools significantly reduces the complexity of implementing optimization solutions, allowing users to focus on modeling rather than algorithmic details.
2. Convex Optimization¶
Convex vs. Nonconvex Problems
- Convex optimization is an extremely powerful subset of all optimization problems
Definition of convex optimization
A convex optimization problem is one of the most important and well-studied classes of optimization problems. It is defined as:
$$\begin{align*} \min_{x} \quad &f(x) \\ \text{subject to} \quad & x \in \mathcal{C} \end{align*} $$
where
$f: \mathbb{R}^n \rightarrow \mathbb{R}$ is a convex function and
Feasible region $\mathcal{C}$ is a convex set
Remarks)
Key property of convex optimization:
- Any local minimum is also a global minimum
We will use CVXPY as a convex optimization solver
- https://www.cvxpy.org/
- Many examples later
In the definition of convex optimization, two fundamental concepts are used:
- Convex Functions
- Convex Sets
Before we formally define these, it is helpful to first introduce the concept of linear interpolation, which serves as the geometric and algebraic basis for understanding convexity.
Linear Interpolation between Two Points
Given two points $\vec x$ and $\vec y$, the interpolated value at any point between them is obtained using a convex combination of the two points.
$$ \begin{align*} \vec{z} &= \vec{y} + \theta (\vec{x} - \vec{y}) = \theta \vec{x} + (1-\theta) \vec{y},\qquad 0 \leq \theta \leq 1\\ \\ \text{or }\;\; \vec{z} &= \alpha \vec{x} + \beta \vec{y}, \qquad \alpha + \beta = 1 \;\;\text{and}\; 0 \leq \alpha, \beta \end{align*} $$
This simple idea of convex combinations naturally leads to the definitions of convex functions and convex sets.
Convex Function
For any $x,y \in \mathbb{R}^n$ and $\theta \in [0,1]$
$$f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta)f(y)$$
This condition ensures that the function lies below the line segment connecting any two points on its graph, meaning it has a "bowl-shaped" or "U-shaped" curvature.
Convex Set
A set $\mathcal{C} $ is said to be convex if, for any $x, y \in \mathcal{C}$ and any $\theta \in [0, 1]$, the convex combination:
$$\theta x + (1-\theta)y \in \mathcal{C}$$
This means that the entire line segment connecting any two points in the set lies completely within the set.
from IPython.display import YouTubeVideo
YouTubeVideo('CqYJhPOFPGk', width = "560", height = "315")
To build foundational intuition for optimization, we begin with the unconstrained, one-dimensional case, where the objective is to find the minimum of a function $f(x)$ without any constraints.
1D Case: First-Order Condition
To find the minimum point $x^*$, we analyze the derivative $f'(x)$:
Any point where $f'(x) = 0$ is a stationary point - the slope is zero, indicating a "flat" region.
If $f(x)$ is convex, such a point is guaranteed to be a global minimum.
Multivariate Case
For a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$, the derivative generalizes to the gradient, denoted by $\nabla_x f(x)$.
- The optimality condition in the unconstrained case is:
$$ \nabla _x f(x^*) = 0$$
- This states that at the optimal point $x^*$, the function has zero slope in all directions.
Gradient Definition
$$ x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \quad \quad \quad \quad \nabla _x f(x) = \begin{bmatrix} \partial f(x) \over \partial x_1 \\ \vdots\\ \partial f(x) \over \partial x_n \end{bmatrix} $$
Each component $\partial f \over \partial x_i$ represents the rate of change of $f(x)$ with respect to $x_i$, holding all other variables constant. For continuously differentiable $f$ and unconstrained optimization, optimal point must have $\nabla _x f(x^*)=0$
3.1. How to Find $\nabla _x f(x) = 0$: Analytic Approach¶
Direct (Closed-Form) Solution
In some cases, we can analytically compute the point $x^*$ that satisfies the first-order optimality condition:
$$ \nabla _x f(x^*)=0$$
Example
Consider the following function:
$$f(x) = 2x_1^2+ x_2^2 + x_1 x_2 -6 x_1 -5 x_2$$
Compute the gradient:
$$\nabla _x f(x) = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \end{bmatrix} = \begin{bmatrix} 4x_1+x_2-6\\ 2x_2 + x_1 -5 \end{bmatrix}$$
Set the gradient to zero:
$$\nabla _x f(x) = \begin{bmatrix} 4x_1+x_2-6\\ 2x_2 + x_1 -5 \end{bmatrix} = \begin{bmatrix}0\\0 \end{bmatrix}$$
Solve the linear system:
$$x^* = \begin{bmatrix} 4 & 1\\ 1 & 2 \end{bmatrix} ^{-1} \begin{bmatrix} 6 \\ 5\\ \end{bmatrix} = \begin{bmatrix} 1 \\ 2\\ \end{bmatrix}$$
In matrix calculus, differentiation follows principles similar to scalar calculus. However, due to the additional structure introduced by matrix and vector dimensions, special care must be taken when computing matrix derivatives. Below are fundamental differentiation rules commonly used.
Examples of Gradients in Optimization
Understanding the gradients of commonly used functions is essential for solving optimization problems analytically. Below are standard forms and their corresponding gradients:
(1) Affine function $g(x) = a^Tx + b$
$$\nabla g(x) = a$$
(2) Quadratic function $g(x) = x^T P x + q^T x + r,\qquad P = P^T$
$$\nabla g(x) = 2Px + q$$
(3) Squared norm function $g(x) = \lVert Ax - b \rVert ^2 = x^TA^TAx - 2b^TAx + b^Tb$
$$\nabla g(x) = 2A^TAx-2A^Tb$$
Revisiting the Least-Squares Solution
Consider the least-squares cost function:
$$J(x) = \|Ax - y\|^2 = (Ax - y)^T (Ax - y)$$
Expanding:
$$ \begin{align*} J(x) &= (Ax - y)^T (Ax - y) \\ &= x^T A^T A x - x^T A^T y - y^T A x + y^T y \end{align*} $$
Taking the gradient:
$$ \begin{align*} \frac{\partial J}{\partial x} &= A^T A x + (A^T A)^T x - A^T y - (y^T A)^T \\ &= 2A^T A x - 2A^T y \end{align*} $$
Setting the gradient to zero:
$$ 2A^T A x - 2A^T y = 0 \quad \Rightarrow \quad A^T A x = A^T y $$
Solution:
$$ x^* = (A^T A)^{-1} A^T y $$
3.2. How to Find $\nabla _x f(x) = 0$: Iterative Approach¶
Iterative Methods
In many real-world scenarios, the condition $\nabla_x f(x^*) = 0$ cannot be solved analytically. Instead, we rely on iterative optimization algorithms to approximate the solution. Iterative methods are essential tools for solving optimization problems where no closed-form solution exists.
Gradient Direction
The gradient $\nabla_x f(x)$ points in the direction of steepest ascent of the function $f(x)$.
Therefore, to minimize $f(x)$, we move in the opposite direction of the gradient.
3.2.1. Gradient Descent¶
Gradient descent is one of the most fundamental and widely used iterative optimization algorithms. It is particularly useful when a closed-form solution to $\nabla_x f(x) = 0$ does not exist or is intractable to compute analytically.
Update Rule
Gradient descent works by repeatedly updating the variable $x$ in the direction opposite to the gradient:
$$ x \leftarrow x - \alpha \nabla _x f(x) $$
Where:
$\nabla_x f(x)$ is the gradient of the objective function at point $x$,
$\alpha > 0$ is the step size (also called the learning rate).
The gradient descent algorithm iteratively updates the variable $x$ using the following rule:
$$\color{red}{\text{Repeat : }} x \leftarrow x - \alpha \nabla _x f(x) $$
At each iteration, we move in the direction opposite to the gradient (i.e., downhill).
Stopping Criteria for Gradient Descent
Gradient descent is an iterative algorithm, and determining when to stop is crucial for both accuracy and efficiency. Common stopping conditions include:
- when the gradient is sufficiently small
- if the change in function value between iterations is very small
- if the update step is very small
- when the number of iterations hit the hard limit
Gradient Descent in Higher Dimensions
Gradient descent naturally extends to higher-dimensional functions. Although we cannot visualize optimization in dimensions higher than 3, 2D visualizations are commonly used to build intuition:
The gradient $\nabla_x f(x)$ in higher dimensions is a vector pointing in the direction of steepest ascent.
Gradient descent steps move opposite to this direction, descending along the surface of the function toward a minimum.
In 2D, this is often illustrated as a "ball rolling down a hill" in a contour or surface plot.
3.2.2. Choosing Step Size¶
Learning Rate $\alpha$
The learning rate $\alpha$ is a hyperparameter that controls the step size used in each iteration of the gradient descent algorithm:
$$ x \leftarrow x - \alpha \nabla_x f(x) $$
Why It Matters
If $\alpha$ is too small:
- Convergence becomes very slow.
- It may take many iterations to make meaningful progress.
If $\alpha$ is too large:
- The algorithm may overshoot the minimum.
- It can lead to divergence, where the function value increases or oscillates.
3.2.3 Where will We Converge?¶
In machine learning and deep learning, many objective or loss functions are non-convex. These functions may have:
Multiple local minima
Saddle points
Highly nonlinear surfaces with flat regions, ridges, and cliffs
Challenges in Non-Convex Optimization
While gradient descent is a powerful tool, applying it to non-convex problems introduces several challenges:
It may converge to a local minimum instead of a global one.
It can get stuck at saddle points or plateaus.
The initial starting point heavily influences the final solution.
For non-convex optimization problems, it is often recommended to perform multiple trials with random initializations. This increases the likelihood of finding a better local minimum or even a global minimum, especially when the function has a complex landscape.
3.2.4. Example: Gradient Descent on a Quadratic Function¶
Consider the following minimization problem:
$$ \min \quad (x_1-3)^{2} + (x_2-3)^{2} $$
This can be rewritten in quadratic form:
$$ =\min \quad \frac{1}{2} \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} - \begin{bmatrix} 6 & 6 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + 18 $$
Let
$$f = \frac{1}{2}X^THX + g^TX$$
Then:
$$\nabla f = HX+g$$
Using gradient descent, the iterative update becomes:
$$ X_{i+1} = X_{i} - \alpha_i \nabla f(X_i)$$
The following code implements gradient descent for solving a quadratic optimization problem. As shown, the stopping criterion used here is a hard limit on the number of iterations. However, you are encouraged to implement and experiment with other stopping conditions.
import numpy as np
H = np.matrix([[2, 0],[0, 2]])
g = -np.matrix([[6],[6]])
x = np.zeros((2,1))
alpha = 0.2
for i in range(25):
df = H*x + g
x = x - alpha*df
print(x)
from IPython.display import YouTubeVideo
YouTubeVideo('Qsah8HOQGUw', width = "560", height = "315")
Convex Optimization with CVXPY
We will use CVXPY - a Python-based convex optimization modeling library.
CVXPY allows us to express optimization problems in a declarative syntax.
It supports a wide range of standard convex problems
CVXPY is limited to convex problems (where local minimum = global minimum).
4.1. Example: Linear Programming (Convex)¶
A linear programming (LP) problem is a special case of convex optimization in which:
The objective function is linear.
The constraints are also linear.
Maximize the linear objective:
$$ \begin{align*} \max \; & \;\; 3x_1 + \frac{3}{2}x_2\\ \\ \text{subject to} \; & -1 \leq x_1 \leq 2\\ & \quad 0 \leq x_2 \leq 3 \end{align*} $$
Method 1: Graphical Approach
Rewriting the objective function as a level set:
$$ 3 x_1 + 1.5 x_2 = C \qquad \Rightarrow \qquad x_2 = -2 x_1 + \frac{2}{3}C $$
By graphing this family of lines and identifying the one that is highest while intersecting the feasible region, we can visually find the optimal solution.
Method 2: CVXPY-Based Solver
In practice, we solve LPs using a convex optimization solver such as CVXPY. To use CVXPY, we convert the maximization into a minimization problem:
$$ \begin{array}{Icr}\begin{align*} \max_{x} \quad & 3x_1 + {3 \over 2}x_2 \\ \text{subject to} \quad & -1 \leq x_1 \leq 2 \\ & \quad 0 \leq x_2 \leq 3 \end{align*}\end{array} \quad\implies\quad \begin{array}{I} \quad \min_{x} \quad & - \begin{bmatrix} 3 \\ 3 / 2 \end{bmatrix}^T \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \\ \text{subject to} \quad & \begin{bmatrix} -1 \\ 0 \end{bmatrix} \leq \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \leq \begin{bmatrix} 2 \\ 3 \\ \end{bmatrix} \end{array} $$
Here is the CYXPY code:
import numpy as np
import cvxpy as cvx
f = np.array([[3], [3/2]])
lb = np.array([[-1], [0]])
ub = np.array([[2], [3]])
x = cvx.Variable([2,1])
obj = cvx.Minimize(-f.T@x)
constraints = [lb <= x, x <= ub]
prob = cvx.Problem(obj, constraints)
result = prob.solve()
print(x.value)
print(result)
4.2. Example: Quadratic Programming (Convex)¶
A quadratic program (QP) is a type of convex optimization problem where:
The objective function is quadratic.
The constraints are linear.
Minimize the following quadratic function:
$$ \begin{array}{Icr}\begin{align*} \min \quad &\frac{1}{2}x_1^2+3x_1+4x_2 \\ \text{subject to} \quad & x_1+3x_2 \geq 15 \\ & 2x_1+5x_2 \leq 100 \\ & 3x_1+4x_2 \leq 80 \\ & x_1,x_2 \geq 0 \\ \end{align*}\end{array} \quad\implies\quad \begin{array}{I} \min_{X} \quad & X^T HX + f^T X \\ \text{subject to} \quad & AX \leq b \\ & A_{eq}X = b_{eq} \\ & lb \leq X \leq ub \end{array} $$
Here is the CYXPY code:
f = np.array([[3], [4]])
H = np.array([[1/2, 0], [0, 0]])
A = np.array([[-1, -3], [2, 5], [3, 4]])
b = np.array([[-15], [100], [80]])
lb = np.array([[0], [0]])
x = cvx.Variable([2,1])
obj = cvx.Minimize(cvx.quad_form(x, H) + f.T@x)
constraints = [A@x <= b, lb <= x]
prob = cvx.Problem(obj, constraints)
result = prob.solve()
print(x.value)
print(result)
4.3. Example: Best Location at a Concert Hall¶
Suppose we want to find the best listening spot in a concert hall by minimizing the distance to the singer, located at the point $(3, 3)$. The listener must stay within a triangular feasible region defined by physical boundaries or constraints.
Minimize the Euclidean distance to the singer's location $(3, 3)$:
$$\min \quad \sqrt{(x_1-3)^{2}+(x_2-3)^{2}} $$
Since the square root is monotonic, this is equivalent to minimizing the squared distance:
$$\begin{align*} \min \quad & (x_1-3)^{2} + (x_2-3)^{2} \\ \text{subject to} \quad & x_1 + x_2 \leq 3 \\ & x_1 \geq 0 \\ & x_2 \geq 0 \end{align*} $$
Expanding the squared terms:
$$\begin{align*} (x_1-3)^{2} + (x_2-3)^{2} &= x_1^{2} - 6x_1 + 9 + x_2^{2} - 6x_2 + 9 \\ & = x_1^{2} + x_2^{2} - 6 x_1 - 6x_2 + 18 \end{align*} $$
In matrix form:
$$\begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} - \begin{bmatrix} 6 & 6 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + 18$$
Constraints in Matrix Form
(1) Linear inequality constraint:
$$\begin{bmatrix} 1 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \leq 3$$
(2) Box constraints:
$$\begin{bmatrix} 0 \\ 0 \end{bmatrix} \leq \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$$
This is a convex quadratic program (QP) and can be solved using CVXPY.
f = np.array([[-6], [-6]])
H = np.array([[1,0], [0,1]])
A = np.array([1,1])
b = 3
lb = np.array([[0], [0]])
x = cvx.Variable([2,1])
obj = cvx.Minimize(cvx.quad_form(x, H) + f.T@x)
constraints = [A@x <= b, lb <= x]
prob = cvx.Problem(obj, constraints)
result = prob.solve()
print(x.value)
4.4. Example: Shortest Distance¶
We aim to find the optimal point along a river (represented by the x-axis) that minimizes the total walking distance between two locations:
- Walking from point $\vec{a}$ to the river
- Walking from the river to point $\vec{b}$
We define the total distance as the sum of two Euclidean distances:
$$ \min \; d_1 + d_2 = \min \left\rVert \vec{a} - \begin{bmatrix}x\\0\end{bmatrix}\right\rVert_2 + \left\rVert \vec{b} - \begin{bmatrix}x\\0\end{bmatrix}\right\rVert_2$$
Weighted Distance Version
In a more realistic scenario, consider the task of fetching water from the river:
- Walk from $\vec{a}$ to the river with an empty bucket
- Walk from the river to $\vec{b}$ carrying a full bucket
Since carrying a full bucket is harder, we apply a weighting factor $\mu > 1$ to that part of the journey:
$$ \min \; d_1 + \mu \; d_2 = \min \left\rVert \vec{a} - \begin{bmatrix}x\\0\end{bmatrix}\right\rVert_2 + \mu \; \left\rVert \vec{b} - \begin{bmatrix}x\\0\end{bmatrix}\right\rVert_2$$
This optimization problem is convex and can be solved numerically using CVXPY.
a = np.array([[0], [1]])
b = np.array([[4], [2]])
Aeq = np.array([0,1])
beq = 0
x = cvx.Variable([2,1])
mu = 1
obj = cvx.Minimize(cvx.norm(a-x, 2) + mu*cvx.norm(b-x, 2))
constraints = [Aeq@x == beq]
prob = cvx.Problem(obj, constraints)
result = prob.solve()
print(x.value)
print(result)
print(np.sqrt(4**2 + 3**2))
4.5. Example: Supply Chain¶
In this example, we aim to find the optimal facility location (e.g., a warehouse or distribution center) that minimizes the total transportation cost (modeled as Euclidean distance) to three given destinations.
The total transportation cost (objective function) is:
$$ \min_{{x}} \left\| {x} - A \right\|_2 + \left\| {x} - B \right\|_2 + \left\| {x} - C \right\|_2 $$
This can be written as:
$$ \min_{x_1, x_2} \; \left\| \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} - \begin{bmatrix} \sqrt{3} \\ 0 \end{bmatrix} \right\|_2 + \left\| \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} - \begin{bmatrix} -\sqrt{3} \\ 0 \end{bmatrix} \right\|_2 + \left\| \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} - \begin{bmatrix} 0 \\ 3 \end{bmatrix} \right\|_2 $$
This is a convex optimization problem that can be solved by CVXPY.
a = np.array([[np.sqrt(3)], [0]])
b = np.array([[-np.sqrt(3)], [0]])
c = np.array([[0],[3]])
x = cvx.Variable([2,1])
obj = cvx.Minimize(cvx.norm(a-x, 2) + cvx.norm(b-x, 2) + cvx.norm(c-x, 2))
#obj = cvx.Minimize(cvx.norm(a-x, 1) + cvx.norm(b-x, 1) + cvx.norm(c-x, 1))
prob = cvx.Problem(obj)
result = prob.solve()
print(x.value)
B = np.hstack([a,b,c])
np.mean(B,1)
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')