# skpro package¶

## skpro.base module¶

class skpro.base.BayesianVendorEstimator(model=None, adapter=None)[source]

Vendor estimator for Bayesian methods

ProbabilisticEstimator that interfaces a Bayesian vendor using a BayesianVendorInterface and and sample-based Adapter.

class Distribution(estimator, X, selection=slice(None, None, None), mode='elementwise')[source]

Bases: skpro.base.Distribution

X

Reference of the test features that are ought to correspond with the predictive distribution represented by the interface.

The interface methods (e.g. pdf) can use X to construct and exhibit the predictive distribution properties of the interface (e.g. construct the predicted pdf based on X)

Note that X automatically reflects the feature point for which the interface is ought to represent the distributional prediction. For given M x n features, X will thus represent an 1 x n vector that provides the bases for the predicted distribution. However, if the vectorvalued() decorator is applied X will represent the full M x n matrix for an efficient vectorized implementation.

Getter: Returns the test features based on the current subset selection Sets the data reference array
cdf(x, *args, **kwargs)[source]
lp2
mean(*args, **kwargs)

Mean prediction

Returns: The mean prediction that corresponds to self.X
pdf(x, *args, **kwargs)[source]
point[source]
ppf(q, *args, **kwargs)

Percent point function (inverse of cdf — percentiles).

Parameters: q – float
replicate(selection=None, mode=None)

Replicates the distribution object

Parameters: selection (None | slice | int (optional)) – Subset point selection of the distribution copy mode (str (optional)) – Interface mode (‘elementwise’ or ‘batch’) skpro.base.ProbabilisticEstimator.Distribution
std[source]
class ImplementsEnhancedInterface(name, bases, clsdict)

Bases: abc.ABCMeta

Meta-class for distribution interface

Enhances the distribution interface behind the scenes with automatic caching and syntactic sugar for element-wise access of the distributions

mro() → list

return a type’s method resolution order

register(subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

fit(X, y)

Fits the vendor model

Parameters: X (numpy array or sparse matrix of shape [n_samples,n_features]) – Training data y (numpy array of shape [n_samples, n_targets]) – Target values. Will be cast to X’s dtype if necessary self returns an instance of self.
get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. params – Parameter names mapped to their values. mapping of string to any
name()
predict(X)

Predicts using the vendor model

Parameters: X ({array-like, sparse matrix}, shape = (n_samples, n_features)) – Samples. Returns predicted distributions Distribution interface representing n_samples predictions
score(X, y, sample=True, return_std=False)

Returns the log-loss score

Parameters: X (np.array) – Features y (np.array) – Labels sample (boolean, default=True) – If true, loss will be averaged across the sample return_std (boolean, default=False) – If true, the standard deviation of the loss sample will be returned Log-loss score mixed
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns: self
class skpro.base.BayesianVendorInterface[source]

Abstract base class for a Bayesian vendor

Notes

Must implement the samples method that returns Bayesian posterior samples. The sample method should be cached using the functools.lru_cache decorator to increase performance

on_fit(X, y)

Implements vendor fit procedure

Parameters: X (np.array) – Training features y (np.array) – Training labels None
on_predict(X)

Implements vendor predict procedure

Parameters: X (np.array) – Test features None
samples[source]

Returns the predictive posterior samples

Returns: Predictive posterior sample np.array
class skpro.base.ProbabilisticEstimator[source]

Abstract base class for probabilistic prediction models

Notes

All probabilistic estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

class Distribution(estimator, X, selection=slice(None, None, None), mode='elementwise')[source]

Bases: object

Abstract base class for the distribution interface returned by probabilistic estimators

Parameters: estimator (skpro.base.ProbabilisticEstimator) – Parent probabilistic estimator object X (np.array) – Features selection (slice | int (optional)) – Subset point selection of the features mode (str) – Interface mode (‘elementwise’ or ‘batch’)
X

Reference of the test features that are ought to correspond with the predictive distribution represented by the interface.

The interface methods (e.g. pdf) can use X to construct and exhibit the predictive distribution properties of the interface (e.g. construct the predicted pdf based on X)

Note that X automatically reflects the feature point for which the interface is ought to represent the distributional prediction. For given M x n features, X will thus represent an 1 x n vector that provides the bases for the predicted distribution. However, if the vectorvalued() decorator is applied X will represent the full M x n matrix for an efficient vectorized implementation.

Getter: Returns the test features based on the current subset selection Sets the data reference array
cdf(x, *args, **kwargs)[source]
lp2[source]
mean(*args, **kwargs)[source]

Mean prediction

Returns: The mean prediction that corresponds to self.X
pdf(x, *args, **kwargs)[source]
point[source]
ppf(q, *args, **kwargs)[source]

Percent point function (inverse of cdf — percentiles).

Parameters: q – float
replicate(selection=None, mode=None)[source]

Replicates the distribution object

Parameters: selection (None | slice | int (optional)) – Subset point selection of the distribution copy mode (str (optional)) – Interface mode (‘elementwise’ or ‘batch’) skpro.base.ProbabilisticEstimator.Distribution
std[source]
class ImplementsEnhancedInterface(name, bases, clsdict)[source]

Bases: abc.ABCMeta

Meta-class for distribution interface

Enhances the distribution interface behind the scenes with automatic caching and syntactic sugar for element-wise access of the distributions

mro() → list

return a type’s method resolution order

register(subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

fit(X, y)[source]

Fits the model

Parameters: X (numpy array or sparse matrix of shape [n_samples,n_features]) – Training data y (numpy array of shape [n_samples, n_targets]) – Target values. Will be cast to X’s dtype if necessary self returns an instance of self.
get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. params – Parameter names mapped to their values. mapping of string to any
name()[source]
predict(X)[source]

Predicts using the model

Parameters: X ({array-like, sparse matrix}, shape = (n_samples, n_features)) – Samples. Returns predicted distributions Distribution interface representing n_samples predictions
score(X, y, sample=True, return_std=False)[source]

Returns the log-loss score

Parameters: X (np.array) – Features y (np.array) – Labels sample (boolean, default=True) – If true, loss will be averaged across the sample return_std (boolean, default=False) – If true, the standard deviation of the loss sample will be returned Log-loss score mixed
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns: self
class skpro.base.VendorEstimator(model=None, adapter=None)[source]

ProbabilisticEstimator that interfaces a vendor using a VendorInterface and Adapter.

class Distribution(estimator, X, selection=slice(None, None, None), mode='elementwise')[source]

Bases: skpro.base.Distribution

X

Reference of the test features that are ought to correspond with the predictive distribution represented by the interface.

The interface methods (e.g. pdf) can use X to construct and exhibit the predictive distribution properties of the interface (e.g. construct the predicted pdf based on X)

Note that X automatically reflects the feature point for which the interface is ought to represent the distributional prediction. For given M x n features, X will thus represent an 1 x n vector that provides the bases for the predicted distribution. However, if the vectorvalued() decorator is applied X will represent the full M x n matrix for an efficient vectorized implementation.

Getter: Returns the test features based on the current subset selection Sets the data reference array
cdf(x, *args, **kwargs)
lp2
mean(*args, **kwargs)

Mean prediction

Returns: The mean prediction that corresponds to self.X
pdf(x, *args, **kwargs)
point
ppf(q, *args, **kwargs)

Percent point function (inverse of cdf — percentiles).

Parameters: q – float
replicate(selection=None, mode=None)

Replicates the distribution object

Parameters: selection (None | slice | int (optional)) – Subset point selection of the distribution copy mode (str (optional)) – Interface mode (‘elementwise’ or ‘batch’) skpro.base.ProbabilisticEstimator.Distribution
std
class ImplementsEnhancedInterface(name, bases, clsdict)

Bases: abc.ABCMeta

Meta-class for distribution interface

Enhances the distribution interface behind the scenes with automatic caching and syntactic sugar for element-wise access of the distributions

mro() → list

return a type’s method resolution order

register(subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

fit(X, y)[source]

Fits the vendor model

Parameters: X (numpy array or sparse matrix of shape [n_samples,n_features]) – Training data y (numpy array of shape [n_samples, n_targets]) – Target values. Will be cast to X’s dtype if necessary self returns an instance of self.
get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. params – Parameter names mapped to their values. mapping of string to any
name()
predict(X)[source]

Predicts using the vendor model

Parameters: X ({array-like, sparse matrix}, shape = (n_samples, n_features)) – Samples. Returns predicted distributions Distribution interface representing n_samples predictions
score(X, y, sample=True, return_std=False)

Returns the log-loss score

Parameters: X (np.array) – Features y (np.array) – Labels sample (boolean, default=True) – If true, loss will be averaged across the sample return_std (boolean, default=False) – If true, the standard deviation of the loss sample will be returned Log-loss score mixed
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns: self
class skpro.base.VendorInterface[source]

Bases: object

Abstract base class for a vendor interface

on_fit(X, y)[source]

Implements vendor fit procedure

Parameters: X (np.array) – Training features y (np.array) – Training labels None
on_predict(X)[source]

Implements vendor predict procedure

Parameters: X (np.array) – Test features None
skpro.base.vectorvalued(f)[source]

Decorates a distribution function to disable automatic vectorization.

Parameters: f (The function to decorate) – Decorated function

## skpro.density module¶

class skpro.density.DensityAdapter[source]

Abtract base class for density adapter that transform an input into an density cdf/pdf interface

cdf(x)[source]

Cumulative density function

Parameters: x – mixed Cumulative density function evaluated at x
get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. params – Parameter names mapped to their values. mapping of string to any
pdf(x)[source]

Probability density function

Parameters: x – mixed Density function evaluated at x
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns: self
class skpro.density.EmpiricalDensityAdapter[source]

DensityAdapter that uses empirical cdf to transform samples

cdf(x)[source]

Cumulative density function

Parameters: x – mixed Cumulative density function evaluated at x
get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. params – Parameter names mapped to their values. mapping of string to any
pdf(x)[source]

Probability density function

Parameters: x – mixed Density function evaluated at x
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns: self
class skpro.density.KernelDensityAdapter(estimator=KernelDensity(algorithm='auto', atol=0, bandwidth=1.0, breadth_first=True, kernel='gaussian', leaf_size=40, metric='euclidean', metric_params=None, rtol=0))[source]

DensityAdapter that uses kernel density estimation to transform samples

cdf(x)[source]

Cumulative density function

Parameters: x – mixed Cumulative density function evaluated at x
get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. params – Parameter names mapped to their values. mapping of string to any
pdf(x)[source]

Probability density function

Parameters: x – mixed Density function evaluated at x
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns: self
skpro.density.ecdf(a)[source]

Returns the empirical distribution function of a sample

Parameters: a (array) – Input array representing a sample array xs Empirical cdf of the input sample array ys
skpro.density.step_function(xs, ys)[source]

Returns a step function from x-y pair sample

Parameters: x values (xs) – corresponding y values (ys) – function A step function

## skpro.ensemble module¶

class skpro.ensemble.BaggingRegressor(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)[source]

Bases: sklearn.ensemble.bagging.BaggingRegressor, skpro.base.ProbabilisticEstimator

class Distribution(estimator, X, distributions, n_estimators)[source]

Bases: skpro.base.Distribution

X

Reference of the test features that are ought to correspond with the predictive distribution represented by the interface.

The interface methods (e.g. pdf) can use X to construct and exhibit the predictive distribution properties of the interface (e.g. construct the predicted pdf based on X)

Note that X automatically reflects the feature point for which the interface is ought to represent the distributional prediction. For given M x n features, X will thus represent an 1 x n vector that provides the bases for the predicted distribution. However, if the vectorvalued() decorator is applied X will represent the full M x n matrix for an efficient vectorized implementation.

Getter: Returns the test features based on the current subset selection Sets the data reference array
cdf(x, *args, **kwargs)
lp2
mean(*args, **kwargs)

Mean prediction

Returns: The mean prediction that corresponds to self.X
pdf(x, *args, **kwargs)
point
ppf(q, *args, **kwargs)

Percent point function (inverse of cdf — percentiles).

Parameters: q – float
replicate(selection=None, mode=None)

Replicates the distribution object

Parameters: selection (None | slice | int (optional)) – Subset point selection of the distribution copy mode (str (optional)) – Interface mode (‘elementwise’ or ‘batch’) skpro.base.ProbabilisticEstimator.Distribution
std
class ImplementsEnhancedInterface(name, bases, clsdict)

Bases: abc.ABCMeta

Meta-class for distribution interface

Enhances the distribution interface behind the scenes with automatic caching and syntactic sugar for element-wise access of the distributions

mro() → list

return a type’s method resolution order

register(subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

estimators_samples_

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

fit(X, y, sample_weight=None)
Build a Bagging ensemble of estimators from the training
set (X, y).
Parameters: X ({array-like, sparse matrix} of shape = [n_samples, n_features]) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator. y (array-like, shape = [n_samples]) – The target values (class labels in classification, real numbers in regression). sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting. self object
get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. params – Parameter names mapped to their values. mapping of string to any
name()
predict(X)[source]

Predict regression target for X.

The predicted regression target of an input sample is computed as the averaged predicted distributions of the estimators in the ensemble.

Parameters: X ({array-like, sparse matrix} of shape = [n_samples, n_features]) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator. y – The predicted bagged distributions. skpro.base.Distribution = [n_samples]
score(X, y, sample=True, return_std=False)

Returns the log-loss score

Parameters: X (np.array) – Features y (np.array) – Labels sample (boolean, default=True) – If true, loss will be averaged across the sample return_std (boolean, default=False) – If true, the standard deviation of the loss sample will be returned Log-loss score mixed
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns: self

## skpro.metrics module¶

skpro.metrics.gneiting_loss(y_true, dist_pred, sample=True, return_std=False)[source]

Gneiting loss

Parameters: y_true (np.array) – The true labels dist_pred (ProbabilisticEstimator.Distribution) – The predicted distribution sample (boolean, default=True) – If true, loss will be averaged across the sample return_std (boolean, default=False) – If true, the standard deviation of the loss sample will be returned Loss (with standard deviation if return_std is True) np.array
skpro.metrics.linearized_log_loss(y_true, dist_pred, range=1e-10, sample=True, return_std=False)[source]

Linearized log loss

Parameters: y_true (np.array) – The true labels dist_pred (ProbabilisticEstimator.Distribution) – The predicted distribution range (float) – Threshold value of linearization sample (boolean, default=True) – If true, loss will be averaged across the sample return_std (boolean, default=False) – If true, the standard deviation of the loss sample will be returned Loss (with standard deviation if return_std is True) np.array
skpro.metrics.log_loss(y_true, dist_pred, sample=True, return_std=False)[source]

Log loss

Parameters: y_true (np.array) – The true labels dist_pred (ProbabilisticEstimator.Distribution) – The predicted distribution sample (boolean, default=True) – If true, loss will be averaged across the sample return_std (boolean, default=False) – If true, the standard deviation of the loss sample will be returned Loss (with standard deviation if return_std is True) np.array
skpro.metrics.make_scorer(score_func, greater_is_better=True)[source]

Make a scorer from a performance metric or loss function.

This factory function wraps scoring functions for use in GridSearchCV and cross_val_score. It takes a score function, such as log_loss, and returns a callable that scores an estimator’s output.

Parameters: score_func (callable,) – Score function (or loss function) with signature score_func(y, y_pred, **kwargs). greater_is_better (boolean, default=True) – Whether score_func is a score function (default), meaning high is good, or a loss function, meaning low is good. In the latter case, the scorer object will sign-flip the outcome of the score_func. **kwargs (additional arguments) – Additional parameters to be passed to score_func. scorer – Callable object that returns a scalar score; greater is better. callable
skpro.metrics.rank_probability_loss(y_true, dist_pred, sample=True, return_std=False)[source]

Rank probability loss

where denotes the CDF of the predicted distribution

Parameters: y_true (np.array) – The true labels dist_pred (ProbabilisticEstimator.Distribution) – The predicted distribution sample (boolean, default=True) – If true, loss will be averaged across the sample return_std (boolean, default=False) – If true, the standard deviation of the loss sample will be returned Loss (with standard deviation if return_std is True) np.array
skpro.metrics.sample_loss(loss, return_std=False)[source]

Averages the loss of a sample

Parameters: loss (np.array) – Loss sample return_std (boolean, default=False) – If true, the standard deviation of the loss sample will be returned Sample loss (with standard deviation if return_std is True) np.array

## skpro.model_selection module¶

class skpro.model_selection.RetrievesScores(scorer, score=True, std=False)[source]

Bases: object

skpro.model_selection.cross_val_score(estimator, X, y=None, groups=None, scoring=None, cv=None, n_jobs=1, verbose=0, fit_params=None, pre_dispatch='2*n_jobs')[source]

Evaluate a score using cross-validation

Parameters: estimator (estimator object implementing 'fit') – The object to use to fit the data. X (array-like) – The data to fit. Can be for example a list, or an array. y (array-like, optional, default: None) – The target variable to try to predict in the case of supervised learning. groups (array-like, with shape (n_samples,), optional) – Group labels for the samples used while splitting the dataset into train/test set. scoring (string, callable or None, optional, default: None) – A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y). cv (int, cross-validation generator or an iterable, optional) – Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 3-fold cross validation, - integer, to specify the number of folds in a (Stratified)KFold, - An object to be used as a cross-validation generator. - An iterable yielding train, test splits. For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used. Refer User Guide for the various cross-validation strategies that can be used here. n_jobs (integer, optional) – The number of CPUs to use to do the computation. -1 means ‘all CPUs’. verbose (integer, optional) – The verbosity level. fit_params (dict, optional) – Parameters to pass to the fit method of the estimator. pre_dispatch (int, or string, optional) – Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs An int, giving the exact number of total jobs that are spawned A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’ scores – Array of scores of the estimator for each run of the cross validation with their corresponding uncertainty. numpy.array, shape=(len(list(cv)), 2)

skpro.metrics.make_scorer()
Make a scorer from a performance metric or loss function.

## skpro.utils module¶

skpro.utils.ensure_existence(f)[source]

Ensures that method is not marked as non_existent

Parameters: Method (f) – NotImplementedError if the method is marked as non existent Method f
skpro.utils.not_existing(f)[source]

Decorates an interface method to declare it theoretically non existent

Parameters: Method to decorate (f) – Decorated method
skpro.utils.to_percent(value, return_float=True)[source]

Converts values into a percent representation

Parameters: value – int/float Number representing a percentage return_float – bool If true, float representing the percentage is returned
Returns: int/float
A percentage