Cross-validation
Description
Cross-validation evaluates model performance and robustness by systematically partitioning data into multiple subsets (folds) and training/testing repeatedly on different combinations. Common approaches include k-fold (splitting into k equal parts), stratified (preserving class distributions), and leave-one-out variants. By testing on multiple independent holdout sets, it reveals how performance varies across different data subsamples, provides robust estimates of generalisation ability, and helps detect overfitting or model instability that single train-test splits might miss.
Example Use Cases
Reliability
Using 10-fold cross-validation to estimate a healthcare prediction model's true accuracy and detect overfitting, ensuring robust performance estimates that generalise beyond the specific training sample to new patient populations.
Transparency
Providing transparent model evaluation in regulatory submissions by showing consistent performance across multiple validation folds, demonstrating to auditors that model performance claims are not cherry-picked from a single favourable test set.
Fairness
Ensuring fair model evaluation across demographic groups by using stratified cross-validation that maintains representative proportions of protected classes in each fold, revealing whether performance is consistent across different population segments.
Limitations
- Computationally expensive for large datasets or complex models, requiring multiple training runs that scale linearly with the number of folds.
- Can provide overly optimistic performance estimates when data has dependencies or structure (e.g., time series, grouped observations) that violate independence assumptions.
- May not reflect real-world performance if the training data distribution differs significantly from future deployment conditions or population shifts.
- Choice of fold number (k) involves a bias-variance trade-off: fewer folds reduce computational cost but increase variance in estimates, whilst more folds increase computation but may introduce bias.
- Standard cross-validation doesn't account for temporal ordering in sequential data, potentially leading to data leakage where future information influences past predictions.
Resources
scikit-learn Cross-validation User Guide
Comprehensive guide to cross-validation methods and implementations in scikit-learn
Cross-validation: what does it estimate and how well does it do it?
Theoretical analysis of what cross-validation estimates and its accuracy in practice