Bootstrapping

Description

Bootstrapping estimates uncertainty by repeatedly resampling the original dataset with replacement to create many new training sets, training a model on each sample, and analysing the variation in predictions. This approach provides confidence intervals and stability measures without making strong statistical assumptions. By showing how predictions change with different random samples of the data, it reveals how sensitive the model is to the specific training examples and provides robust uncertainty estimates.

Example Use Cases

Reliability

Estimating uncertainty in financial risk models by resampling historical data to understand how predictions might vary under different historical scenarios.

Transparency

Providing confidence intervals for medical diagnosis predictions to help doctors understand the reliability of AI recommendations and make more informed treatment decisions.

Fairness

Assessing whether prediction uncertainty is consistent across different demographic groups in hiring algorithms, identifying if the model is systematically more uncertain for certain populations.

Limitations

Computationally expensive as it requires training multiple models on resampled datasets.
Does not account for uncertainty in model structure or architecture choices.
Cannot detect systematically missing data patterns or biases present in the original dataset.
Assumes that the original dataset is representative of the population of interest.

Resources

Deterministic bootstrapping for a class of bootstrap methods

Research Paper•Thomas Pitschel•Mar 26, 2019

A Gentle Introduction to the Bootstrap Method ...

Tutorial

scipy.stats.bootstrap

Software Package

Bootstrapping and bagging — modAL documentation

Tutorial

Machine Learning: What is Bootstrapping? - KDnuggets

Tutorial

Related Techniques

Name	Description	Assurance Goals
MLflow Experiment Tracking	MLflow is an open-source platform that tracks machine learning experiments by automatically logging parameters, metrics, models, and artifacts throughout the ML lifecycle. It provides a centralised repository for comparing different experimental runs, reproducing results, and managing model versions. Teams can track hyperparameters, evaluation metrics, model files, and execution environment details, creating a comprehensive audit trail that supports collaboration, reproducibility, and regulatory compliance across the entire machine learning development process.	Transparency Reliability
Monte Carlo Dropout	Monte Carlo Dropout estimates prediction uncertainty by applying dropout (randomly setting neural network weights to zero) during inference rather than just training. It performs multiple forward passes through the network with different random dropout patterns and collects the resulting predictions to form a distribution. Low variance across predictions indicates epistemic certainty (the model is confident), while high variance suggests epistemic uncertainty (the model is unsure). This technique transforms any dropout-trained neural network into a Bayesian approximation for uncertainty quantification.	Explainability Reliability
DeepLIFT	DeepLIFT (Deep Learning Important FeaTures) explains neural network predictions by decomposing the difference between the actual output and a reference output back to individual input features. It compares each neuron's activation to a reference activation (typically from a baseline input like all zeros or the dataset mean) and propagates these differences backwards through the network using chain rule modifications. Unlike gradient-based methods, DeepLIFT satisfies the sensitivity property (zero input gets zero attribution) and provides more stable attributions by using discrete differences rather than gradients.	Explainability Transparency
Anomaly Detection	Anomaly detection identifies unusual behaviours, inputs, or outputs that deviate significantly from established normal patterns using statistical, machine learning, or rule-based methods. Applied to AI/ML systems, it serves as a continuous monitoring mechanism that can flag unexpected model predictions, suspicious input patterns, data drift, adversarial attacks, or operational malfunctions. By establishing baselines of normal system behaviour and alerting when deviations exceed predefined thresholds, organisations can detect potential security threats, model degradation, fairness violations, or system failures before they cause significant harm.	Safety Reliability Fairness Security
Mean Decrease Impurity	Mean Decrease Impurity (MDI) quantifies a feature's importance in tree-based models (e.g., Random Forests, Gradient Boosting Machines) by measuring the total reduction in impurity (e.g., Gini impurity, entropy) across all splits where the feature is used. Features that lead to larger, more consistent reductions in impurity are considered more important, indicating their effectiveness in creating homogeneous child nodes and improving predictive accuracy.	Explainability Reliability
Prompt Sensitivity Analysis	Prompt Sensitivity Analysis systematically evaluates how variations in input prompts affect large language model outputs, providing insights into model robustness, consistency, and interpretability. This technique involves creating controlled perturbations of prompts whilst maintaining semantic meaning, then measuring how these changes influence model responses. It encompasses various types of prompt modifications including lexical substitutions, syntactic restructuring, formatting changes, and contextual variations. The analysis typically quantifies sensitivity through metrics such as output consistency, semantic similarity, and statistical measures of variance across prompt variations.	Explainability Reliability Safety

Bootstrapping

Description

Example Use Cases

Reliability

Transparency

Fairness

Limitations

Resources

Deterministic bootstrapping for a class of bootstrap methods

A Gentle Introduction to the Bootstrap Method ...

scipy.stats.bootstrap

Bootstrapping and bagging — modAL documentation

Machine Learning: What is Bootstrapping? - KDnuggets

Related Techniques

Tags