Gaussian Process
Source
Table of Contents
Suppose x∼N(μ,Σ), here Σ=ΣT and Σ>0.
Suppose x∼N(μ,Σ), and
Let's look at the component x1=[I0]x
Suppose x∼N(μx,Σx). Consider the linear function of x
Suppose x∼N(0,Σ), and let c∈Rn be a unit vector
Let y=cTx
E(y2)=λmax
Suppose x∼N(μ,Σ), and
Suppose we measure x2=y. We would like to find the conditional pdf of x1 given x2=y
By the completion of squares formula
If x∼N(0,Σ), then the conditional pdf of x1 given x2=y is Gaussian
It is a linear function of y.
It is not a function of y. Instead, it is constant.
Conditional confidence intervals are narrower. i.e., measuring x2 gives information about x1
If x∼N(μ,Σ), then the conditional pdf of x1 given x2=y is Gaussian
Estimation problem: we measure y=ymeas and would like to estimate x.
The MMSE estimate of x given y=ymeas is
The posterior covariance of x given y=ymeas is
Example: linear measurements with Gaussian noise
Suppose x∼N(μx,Σx) and ω∼N(μω,Σω)
The MMSE estimate of x given y=ymeas is
For Gaussian, the MMSE estimate is equal to the MAP estimate.
The least-squares approach minimizes
where A=[a1a2⋯am]T
Suppose instead we minimize
where ω1,ω2,⋯,ωm provide weights
Then the optimum xwls is
Suppose we choose weight W=Σ−1ω, then
WLS solution is
MMSE estimate is
%%html
<center><iframe src="https://www.youtube.com/embed/dtkGq9tdYcI"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
Weight-space view
Function-space view will be discussed at the following section.
Then Zx=xTω is a GP on S=Rd
With non-linear basis
Generally
%%html
<center><iframe src="https://www.youtube.com/embed/upJ31CIVWZo"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
The key step of GP regresssion
Let z∼N(μ,K)∈Rn, and ε∼N(0,σ2I)∈Rn are independent
a=[1,⋯,l]T and b=[l+1,⋯,n]T
Conditional distribution p(ya∣yb)∼N(m,D)
where
%%html
<center><iframe src="https://www.youtube.com/embed/UH1d2mxwet8"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
%%html
<center><iframe src="https://www.youtube.com/embed/JdZr74mtZkU"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
Function-space view
Gaussian Processes: From the Basics to the State-of-the-Art, Imperial's ML tutorial series
%%html
<center><iframe src="https://www.youtube.com/embed/92-98SYOdlY"
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
we can describe a Gaussian process as a distribution over functions. Just as a multivariate normal distribution is completely specified by a mean vector and covariance matrix, a GP is fully specified by a mean function and a covariance function:
For example, one specification of a GP might be:
Here, the covariance function is a squared exponential, for which values of x and x′ that are close together result in values of k closer to one, while those that are far apart return values closer to zero.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
k(x,x′)=θ1exp(−θ22(x−x′)2)
def exponential_cov(x, y, params):
return params[0] * np.exp( - params[1]/2 * np.subtract.outer(x, y)**2)
p(x∣y)=N(μx+ΣxyΣ−1y(y−μy),Σx−ΣxyΣ−1yΣTxy)
def conditional(x_new, x, y, params):
B = exponential_cov(x_new, x, params)
C = exponential_cov(x, x, params)
A = exponential_cov(x_new, x_new, params)
mu = np.linalg.inv(C).dot(B.T).T.dot(y)
sigma = A - B.dot(np.linalg.inv(C).dot(B.T))
return(mu.squeeze(), sigma.squeeze())
theta = [1, 5]
sigma = exponential_cov(0, 0, theta)
xpts = np.arange(-3, 3, step=0.01)
plt.errorbar(xpts, np.zeros(len(xpts)), yerr=sigma, capsize=0)
x = [1.]
y = [np.random.normal(scale=sigma)]
y
sigma_1 = exponential_cov(x, x, theta)
def predict(x, data, kernel, params, sigma, t):
k = [kernel(x, y, params) for y in data]
Sinv = np.linalg.inv(sigma)
y_pred = np.dot(k, Sinv).dot(t)
sigma_new = kernel(x, x, params) - np.dot(k, Sinv).dot(k)
return y_pred, sigma_new
x_pred = np.linspace(-3, 3, 1000)
predictions = [predict(i, x, exponential_cov, theta, sigma_1, y) for i in x_pred]
y_pred, sigmas = np.transpose(predictions)
plt.errorbar(x_pred, y_pred, yerr=sigmas, capsize=0)
plt.plot(x, y, "ro")
m, s = conditional([-0.7], x, y, theta)
y2 = np.random.normal(m, s)
y2
x.append(-0.7)
y.append(y2)
sigma_2 = exponential_cov(x, x, theta)
predictions = [predict(i, x, exponential_cov, theta, sigma_2, y) for i in x_pred]
y_pred, sigmas = np.transpose(predictions)
plt.errorbar(x_pred, y_pred, yerr=sigmas, capsize=0)
plt.plot(x, y, "ro")
x_more = [-2.1, -1.5, 0.3, 1.8, 2.5]
mu, s = conditional(x_more, x, y, theta)
y_more = np.random.multivariate_normal(mu, s)
y_more
x += x_more
y += y_more.tolist()
sigma_new = exponential_cov(x, x, theta)
predictions = [predict(i, x, exponential_cov, theta, sigma_new, y) for i in x_pred]
y_pred, sigmas = np.transpose(predictions)
plt.errorbar(x_pred, y_pred, yerr=sigmas, capsize=0)
plt.plot(x, y, "ro")
There are a number of libraries available for specifying and fitting GP models in a more automated way.
scikit-learn
(we will explore this)GPflow
PyMC3
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern, WhiteKernel, ConstantKernel
x = np.linspace(0,10,100)
r = np.sin(x)
plt.plot(x,r)
plt.show()
y = r + 1*np.random.rand(100)
plt.plot(x,y,'.')
scikit-learn
offers a library of about kernels to choose from. A flexible choice to start with is the Matern covariance.
kernel = ConstantKernel() + Matern(length_scale=2, nu=3/2) + WhiteKernel(noise_level=1)
X = x.reshape(-1,1)
gp = GaussianProcessRegressor(kernel = kernel)
gp.fit(X,y)
gp.kernel_
x_pred = np.linspace(0, 10, 100).reshape(-1,1)
y_pred, sigma = gp.predict(x_pred, return_std=True)
plt.plot(x_pred, y_pred)
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')