Gaussian Process tiny explorer
Gaussian Process tiny explorer
This is a ongoing project with many parts under construction - please expect frequent changes and sharp edges.
Note: parts of the project in italic font are under construction.
In this example, we use Gaussian process to model the concentration of CO2 at Mauna Loa as a function of time.
# handcraft a composite kernel based on expert knowledge
# long-term trend
k1 = 30.0**2 * RBFKernel(l=200.0)
# seasonal variations
k2 = 3.0**2 * RBFKernel(l=200.0) * PeriodicKernel(p=1.0, l=1.0)
# medium-term irregularities
k3 = 0.5**2 * RationalQuadraticKernel(m=0.8, l=1.0)
# noise
k4 = 0.1**2 * RBFKernel(l=0.1) + 0.2**2 * WhiteKernel()
# composite kernel
kernel = k1 + k2 + k3 + k4
# train GPR on data
gpr = GaussianProcessRegressor(kernel=kernel)
gpr.fit(X, y)
In the plot, scattered dots represent historical observations, and shaded area shows the predictive interval (μ - σ, μ + σ) prophesied by a Gaussian process regressor trained on the historical data.
Here we use a synthesized dataset for ease of illustration and investigate sampling inference techniques such as Markov chain Monte Carlo. As a Gaussian process defines the predictive distribution, we can get a sense of it by sampling from its prior distribution (before seeing training set) and posterior distribution (after seeing training set).
# with the current hyperparameter configuration,
# ... what is the prior distribution p(y_test)
y_prior = gpr.prior_predictive(X, n_samples=6)
# ... what is the posterior distribution p(y_test|y_train)
y_posterior = gpr.posterior_predictive(X, n_samples=4)
We can also sample from the posterior distribution of a hyperparameter, which characterizes its uncertainty beyond a single point estimate such as MLE or MAP.
# invoke MCMC sampler to sample hyper values from its posterior distribution
hyper_posterior = gpr.hyper_posterior(n_samples=10000)
We demonstrate a simple example of Bayesian optimization. It starts by exploring the objective function globally and shifts to exploiting “promising areas” as more observations are made.
# number of evaluations
n_evals = 10
# surrogate model (Gaussian process)
surrogate = GaussianProcessRegressor(1.0 * MaternKernel(d=5, l=1.0) +
1.0 * WhiteKernel())
# bayesian optimizer
bayesopt = BayesianOptimizer(fun=f, bounds=b, x0=x0, n_evals=n_evals,
acquisition='lcb', surrogate=surrogate)
bayesopt.minimize(callback=callback)
GPie makes extensive use of de facto standard scientific computing packages in Python:
GPie requires Python 3.10 or greater. The easiest way to install GPie is from a prebuilt wheel using pip:
pip install --upgrade gpie
You can also install from source to try out the latest features (requires build>=0.7.0
):
pip install --upgrade git+https://github.com/zackxzhang/gpie