Bootstrapping
Description
Bootstrapping estimates uncertainty by repeatedly resampling the original dataset with replacement to create many new training sets, training a model on each sample, and analysing the variation in predictions. This approach provides confidence intervals and stability measures without making strong statistical assumptions. By showing how predictions change with different random samples of the data, it reveals how sensitive the model is to the specific training examples and provides robust uncertainty estimates.
Example Use Cases
Reliability
Estimating uncertainty in financial risk models by resampling historical data to understand how predictions might vary under different historical scenarios.
Transparency
Providing confidence intervals for medical diagnosis predictions to help doctors understand the reliability of AI recommendations and make more informed treatment decisions.
Fairness
Assessing whether prediction uncertainty is consistent across different demographic groups in hiring algorithms, identifying if the model is systematically more uncertain for certain populations.
Limitations
- Computationally expensive as it requires training multiple models on resampled datasets.
- Does not account for uncertainty in model structure or architecture choices.
- Cannot detect systematically missing data patterns or biases present in the original dataset.
- Assumes that the original dataset is representative of the population of interest.
Resources
Research Papers
Deterministic bootstrapping for a class of bootstrap methods
An algorithm is described that enables efficient deterministic approximate computation of the bootstrap distribution for any linear bootstrap method $T_n^*$, alleviating the need for repeated resampling from observations (resp. input-derived data). In essence, the algorithm computes the distribution function from a linear mixture of independent random variables each having a finite discrete distribution. The algorithm is applicable to elementary bootstrap scenarios (targetting the mean as parameter of interest), for block bootstrap, as well as for certain residual bootstrap scenarios. Moreover, the algorithm promises a much broader applicability, in non-bootstrapped hypothesis testing.