:rocket: Version 2.9.0 out now! Read the release notes here..

skpro is a library for supervised probabilistic prediction in python.
It provides scikit-learn-like, scikit-base compatible interfaces to:

  • tabular supervised regressors for probabilistic prediction - interval, quantile and distribution predictions
  • tabular probabilistic time-to-event and survival prediction - instance-individual survival distributions
  • metrics to evaluate probabilistic predictions, e.g., pinball loss, empirical coverage, CRPS, survival losses
  • reductions to turn scikit-learn regressors into probabilistic skpro regressors, such as bootstrap or conformal
  • building pipelines and composite models, including tuning via probabilistic performance metrics
  • symbolic probability distributions with value domain of pandas.DataFrame-s and pandas-like interface
:dizzy: Features

Our objective is to enhance the interoperability and usability of the AI model ecosystem:

  • skpro is compatible with scikit-learn and sktime, e.g., an sktime proba forecaster can
    be built with an skpro proba regressor which in an sklearn regressor with proba mode added by skpro

  • skpro provides a mini-package management framework for first-party implementations,
    and for interfacing popular second- and third-party components,
    such as cyclic-boosting, MAPIE, or ngboost packages.

skpro curates libraries of components of the following types:

Module Status Links
Probabilistic tabular regression maturing Tutorial · API Reference · Extension Template
Time-to-event (survival) prediction maturing Tutorial · API Reference · Extension Template
Performance metrics maturing API Reference
Probability distributions maturing Tutorial · API Reference · Extension Template

:hourglass_flowing_sand: Installing skpro

To install skpro, use pip:

  1. pip install skpro

or, with maximum dependencies,

  1. pip install skpro[all_extras]

Releases are available as source packages and binary wheels. You can see all available wheels here.

:zap: Quickstart

Making probabilistic predictions

  1. from sklearn.datasets import load_diabetes
  2. from sklearn.ensemble import RandomForestRegressor
  3. from sklearn.linear_model import LinearRegression
  4. from sklearn.model_selection import train_test_split
  5. from skpro.regression.residual import ResidualDouble
  6. # step 1: data specification
  7. X, y = load_diabetes(return_X_y=True, as_frame=True)
  8. X_train, X_new, y_train, _ = train_test_split(X, y)
  9. # step 2: specifying the regressor - any compatible regressor is valid!
  10. # example - "squaring residuals" regressor
  11. # random forest for mean prediction
  12. # linear regression for variance prediction
  13. reg_mean = RandomForestRegressor()
  14. reg_resid = LinearRegression()
  15. reg_proba = ResidualDouble(reg_mean, reg_resid)
  16. # step 3: fitting the model to training data
  17. reg_proba.fit(X_train, y_train)
  18. # step 4: predicting labels on new data
  19. # probabilistic prediction modes - pick any or multiple
  20. # full distribution prediction
  21. y_pred_proba = reg_proba.predict_proba(X_new)
  22. # interval prediction
  23. y_pred_interval = reg_proba.predict_interval(X_new, coverage=0.9)
  24. # quantile prediction
  25. y_pred_quantiles = reg_proba.predict_quantiles(X_new, alpha=[0.05, 0.5, 0.95])
  26. # variance prediction
  27. y_pred_var = reg_proba.predict_var(X_new)
  28. # mean prediction is same as "classical" sklearn predict, also available
  29. y_pred_mean = reg_proba.predict(X_new)

Evaluating predictions

  1. # step 5: specifying evaluation metric
  2. from skpro.metrics import CRPS
  3. metric = CRPS() # continuous rank probability score - any skpro metric works!
  4. # step 6: evaluat metric, compare predictions to actuals
  5. metric(y_test, y_pred_proba)
  6. >>> 32.19

:wave: Citation

