Bootstrapping

Description

Bootstrapping estimates uncertainty by repeatedly resampling the original dataset with replacement to create many new training sets, training a model on each sample, and analysing the variation in predictions. This approach provides confidence intervals and stability measures without making strong statistical assumptions. By showing how predictions change with different random samples of the data, it reveals how sensitive the model is to the specific training examples and provides robust uncertainty estimates.

Example Use Cases

Reliability

Estimating uncertainty in financial risk models by resampling historical data to understand how predictions might vary under different historical scenarios.

Transparency

Providing confidence intervals for medical diagnosis predictions to help doctors understand the reliability of AI recommendations and make more informed treatment decisions.

Fairness

Assessing whether prediction uncertainty is consistent across different demographic groups in hiring algorithms, identifying if the model is systematically more uncertain for certain populations.

Limitations

  • Computationally expensive as it requires training multiple models on resampled datasets.
  • Does not account for uncertainty in model structure or architecture choices.
  • Cannot detect systematically missing data patterns or biases present in the original dataset.
  • Assumes that the original dataset is representative of the population of interest.

Resources

Research Papers

Deterministic bootstrapping for a class of bootstrap methods
Thomas PitschelMar 26, 2019

An algorithm is described that enables efficient deterministic approximate computation of the bootstrap distribution for any linear bootstrap method $T_n^*$, alleviating the need for repeated resampling from observations (resp. input-derived data). In essence, the algorithm computes the distribution function from a linear mixture of independent random variables each having a finite discrete distribution. The algorithm is applicable to elementary bootstrap scenarios (targetting the mean as parameter of interest), for block bootstrap, as well as for certain residual bootstrap scenarios. Moreover, the algorithm promises a much broader applicability, in non-bootstrapped hypothesis testing.

Software Packages

scipy.stats.bootstrap
Jan 1, 2008

Tutorials

A Gentle Introduction to the Bootstrap Method ...
Jason BrownleeMay 24, 2018
Machine Learning: What is Bootstrapping?
Nate RosidiJan 1, 2023

Documentations

Bootstrapping and bagging — modAL documentation
Jan 1, 2021

Tags

Data Requirements:
Data Type:
Evidence Type:
Expertise Needed:
Explanatory Scope:
Lifecycle Stage:
Technique Type: