项目作者: zackxzhang

项目描述 :
Gaussian Process tiny explorer
高级语言: Python
项目地址: git://github.com/zackxzhang/gpie.git
创建时间: 2020-09-29T20:37:10Z
项目社区:https://github.com/zackxzhang/gpie

开源协议:BSD 3-Clause "New" or "Revised" License

下载


GPie

Language
Python
PyPI
License
Last Commit

Gaussian Process tiny explorer

  • simple: an intuitive syntax inspired by scikit-learn
  • powerful: a compact core of expressive abstractions
  • extensible: a modular design for effortless composition
  • lightweight: a minimal set of dependencies (numpy and scipy only)

This is a ongoing project with many parts under construction - please expect frequent changes and sharp edges.

Features

  • several avant-garde kernels such as spectral kernel and neural kernel allow for exploration of new ideas
  • each kernel implements both isotropic and anisotropic versions to support automatic relevance determination
  • a full-fledged toolkit of kernel operators enables all sorts of kernel engineering, e.g.,
    handcrafting composite kernels based on expert knowledge or exploiting special structure of datasets
  • core computations, such as likelihood and gradient, are carefully optimized for speed and stability
  • sampling inference embraces a probabilistic perspective in learning and prediction to promote robustness
  • Bayesian optimizer offers a principled strategy to optimize expensive and black-box objectives globally

Functionality

  • kernel functions
    • white kernel
    • constant kernel
    • radial basis function kernel
    • rational quadratic kernel
    • Matérn kernel
      • Ornstein-Uhlenbeck kernel
    • periodic kernel
    • spectral kernel
    • neural kernel
  • kernel operators
    • Hadamard: sum, product, exponentiation
    • Kronecker: sum, product
  • Gaussian process
    • regression
    • classification
  • t process
    • regression
    • classification
  • Bayesian optimizer
    • surrogate: Gaussian process, t process
    • acquisition: PI, EI, LCB, ES, KG
  • sampling inference
    • Markov chain Monte Carlo
      • Metropolis-Hastings
      • Hamiltonian + no-U-turn
    • simulated annealing
  • variational inference

Note: parts of the project in italic font are under construction.

Examples

Gaussian process regression on Mauna Loa CO2

In this example, we use Gaussian process to model the concentration of CO2 at Mauna Loa as a function of time.

  1. # handcraft a composite kernel based on expert knowledge
  2. # long-term trend
  3. k1 = 30.0**2 * RBFKernel(l=200.0)
  4. # seasonal variations
  5. k2 = 3.0**2 * RBFKernel(l=200.0) * PeriodicKernel(p=1.0, l=1.0)
  6. # medium-term irregularities
  7. k3 = 0.5**2 * RationalQuadraticKernel(m=0.8, l=1.0)
  8. # noise
  9. k4 = 0.1**2 * RBFKernel(l=0.1) + 0.2**2 * WhiteKernel()
  10. # composite kernel
  11. kernel = k1 + k2 + k3 + k4
  12. # train GPR on data
  13. gpr = GaussianProcessRegressor(kernel=kernel)
  14. gpr.fit(X, y)

alt text
In the plot, scattered dots represent historical observations, and shaded area shows the predictive interval (μ - σ, μ + σ) prophesied by a Gaussian process regressor trained on the historical data.

Sampling inference for Gaussian process regression

Here we use a synthesized dataset for ease of illustration and investigate sampling inference techniques such as Markov chain Monte Carlo. As a Gaussian process defines the predictive distribution, we can get a sense of it by sampling from its prior distribution (before seeing training set) and posterior distribution (after seeing training set).

  1. # with the current hyperparameter configuration,
  2. # ... what is the prior distribution p(y_test)
  3. y_prior = gpr.prior_predictive(X, n_samples=6)
  4. # ... what is the posterior distribution p(y_test|y_train)
  5. y_posterior = gpr.posterior_predictive(X, n_samples=4)

alt text
alt text

We can also sample from the posterior distribution of a hyperparameter, which characterizes its uncertainty beyond a single point estimate such as MLE or MAP.

  1. # invoke MCMC sampler to sample hyper values from its posterior distribution
  2. hyper_posterior = gpr.hyper_posterior(n_samples=10000)

alt text

Bayesian optimization

We demonstrate a simple example of Bayesian optimization. It starts by exploring the objective function globally and shifts to exploiting “promising areas” as more observations are made.

  1. # number of evaluations
  2. n_evals = 10
  3. # surrogate model (Gaussian process)
  4. surrogate = GaussianProcessRegressor(1.0 * MaternKernel(d=5, l=1.0) +
  5. 1.0 * WhiteKernel())
  6. # bayesian optimizer
  7. bayesopt = BayesianOptimizer(fun=f, bounds=b, x0=x0, n_evals=n_evals,
  8. acquisition='lcb', surrogate=surrogate)
  9. bayesopt.minimize(callback=callback)

alt text

Backend

GPie makes extensive use of de facto standard scientific computing packages in Python:

  • numpy: linear algebra, stochastic sampling
  • scipy: gradient-based optimization, stochastic sampling

Installation

GPie requires Python 3.10 or greater. The easiest way to install GPie is from a prebuilt wheel using pip:

  1. pip install --upgrade gpie

You can also install from source to try out the latest features (requires build>=0.7.0):

  1. pip install --upgrade git+https://github.com/zackxzhang/gpie

Roadmap

  • implement Hamiltonian Monte Carlo and no-U-turn
  • implement Kronecker operators for scalable learning on grid data
  • add a demo on characteristics of different kernels
  • add a demo of quantified Occam’s razor