# User Guide¶

This guide will give an overview of the available modules and functionalities of the skpro package. For further details you may explore the API documentation.

Note

skpro uses many of scikit-learn’s building principles and conventions. If you aren’t familiar with the scikit-learn package you may read its basic tutorial.

## Overview¶

The figure below gives an overview about central elements and concepts of skpro and how it extends the scikit-learn toolbox. To understand skpro, it is firstly helpful to quickly review scikit-learn’s classical prediction workflow, particularly its seminal Estimator object. In scikit-learn, an Estimator object represents a certain prediction strategy (e.g. Linear regression), that can be fitted using the fit(X, y) function. The fitted estimator can then be used to obtain the predictions on new data using the predict(X) method. Finally, the predicted values can be compared with the actual targets using one of the available classical loss functions.

skpro seeks to replicate this general pattern and introduces the ProbabilisticEstimator class that encapsulates the probabilistic prediction models. Like the Estimator class it offers a fit and predict method but returns a probability distribution as prediction (Distribution class). The returned distribution objects provide methods to obtain relevant distribution properties, for example the distribution’s probability density function (y_pred.pdf(x)).

The predictions obtained from skpro’s estimators are hence of a genuine probabilistic kind that represent predicted probability distributions for each data point. For example, if predictions for a vector X of length k are obtained, the returned y_pred object represents k predicted distributions. y_pred[i] therefore provides access to the point prediction (e.g. mean) of the i-th distribution, y_pred.std() will return a vector of length k that contains the standard deviations of the predicted distribution, and so forth. In many cases, such as plotting and error calculation, the distributions objects can thus be handled like scikit’s commonly returned prediction vectors.

To evaluate the accuracy of the predicted distributions, skpro provides probabilistic loss metrics. To calculate the loss between prediction and the true target values, you can choose from a variety of available functions in the skpro.metrics module. In the default setting, all loss functions return the averaged loss of the sample. If you’d like to obtain the point-wise loss instead, set sample=False. You can also obtain the confidence interval of the loss by setting return_std to True. For a detailed documentation of the metrics package read the API documentation.

## Available prediction strategies¶

How can probabilistic prediction models be learned, specifically strategies that predict probability distributions? skpro offers a variety of strategies both from the frequentist and Bayesian domain. Please continue to read about strategies of interest below:

The figure below shows an overview of the skpro’s base API which implements the different prediction strategies. For a full documentation you may read the respective module documention.