This guide will give an overview of the available modules and functionalities of the
For further details you may explore the API documentation.
skpro uses many of scikit-learn’s building principles and conventions. If you aren’t familiar with the scikit-learn package you may read its basic tutorial.
The figure below gives an overview about central elements and concepts of skpro and how it extends the scikit-learn toolbox. To understand skpro, it is firstly helpful to quickly review scikit-learn’s classical prediction workflow, particularly its seminal
Estimator object. In scikit-learn, an
Estimator object represents a certain prediction strategy (e.g. Linear regression), that can be fitted using the
fit(X, y) function. The fitted estimator can then be used to obtain the predictions on new data using the
predict(X) method. Finally, the predicted values can be compared with the actual targets using one of the available classical loss functions.
skpro seeks to replicate this general pattern and introduces the
ProbabilisticEstimator class that encapsulates the
probabilistic prediction models. Like the
Estimator class it offers a fit and predict method but returns a probability distribution as prediction (
Distribution class). The returned distribution objects provide methods to obtain relevant distribution properties, for example the distribution’s probability density function (
The predictions obtained from skpro’s estimators are hence of a genuine probabilistic kind that represent predicted probability distributions for each data point. For example, if predictions for a vector
X of length k are obtained, the returned
y_pred object represents k predicted distributions.
y_pred[i] therefore provides access to the point prediction (e.g. mean) of the i-th distribution,
y_pred.std() will return a vector of length k that contains the standard deviations of the predicted distribution, and so forth. In many cases, such as plotting and error calculation, the distributions objects can thus be handled like scikit’s commonly returned prediction vectors.
To evaluate the accuracy of the predicted distributions, skpro provides probabilistic loss metrics. To calculate the loss between prediction and the true target values, you can choose from a variety of available functions in the
skpro.metrics module. In the default setting, all loss functions return the averaged loss of the sample. If you’d like to obtain the point-wise loss instead, set
sample=False. You can also obtain the confidence interval of the loss by setting
True. For a detailed documentation of the metrics package read the API documentation.
Available prediction strategies¶
How can probabilistic prediction models be learned, specifically strategies that predict probability distributions? skpro offers a variety of strategies both from the frequentist and Bayesian domain. Please continue to read about strategies of interest below:
- Baseline strategies, for instance a kernel density estimation on the labels
- Parametric estimation, that estimates parameters of the predicted distributions
- integrations with other vendor packages such as
The figure below shows an overview of the skpro’s base API which implements the different prediction strategies. For a full documentation you may read the respective module documention.