Permutation Importance

Description

Permutation Importance quantifies a feature's contribution to a model's performance by randomly shuffling its values and measuring the resulting drop in predictive accuracy. If shuffling a feature significantly degrades the model's performance, that feature is considered important. This model-agnostic technique helps identify which inputs are genuinely driving predictions, rather than just being correlated with the outcome.

Example Use Cases

Explainability

Assessing which patient characteristics (e.g., age, blood pressure, cholesterol) are most critical for a medical diagnosis model by observing the performance drop when each characteristic's values are randomly shuffled, ensuring the model relies on clinically relevant factors.

Reliability

Validating the robustness of a fraud detection model by permuting features like transaction amount or location, and confirming that the model's ability to detect fraud significantly decreases only for truly important features, thereby improving confidence in its reliability.

Limitations

  • Can be misleading when features are highly correlated, as shuffling one feature might indirectly affect others, leading to an overestimation of its importance.
  • Computationally expensive for large datasets or complex models, as it requires re-evaluating the model many times for each feature.
  • Does not account for interactions between features; it measures the marginal importance of a feature, assuming other features remain unchanged.
  • The choice of metric for evaluating performance drop (e.g., accuracy, F1-score) can influence the perceived importance of features.

Resources

Research Papers

Asymptotic Unbiasedness of the Permutation Importance Measure in Random Forest Models
Burim Ramosaj and Markus PaulyDec 5, 2019

Variable selection in sparse regression models is an important task as applications ranging from biomedical research to econometrics have shown. Especially for higher dimensional regression problems, for which the link function between response and covariates cannot be directly detected, the selection of informative variables is challenging. Under these circumstances, the Random Forest method is a helpful tool to predict new outcomes while delivering measures for variable selection. One common approach is the usage of the permutation importance. Due to its intuitive idea and flexible usage, it is important to explore circumstances, for which the permutation importance based on Random Forest correctly indicates informative covariates. Regarding the latter, we deliver theoretical guarantees for the validity of the permutation importance measure under specific assumptions and prove its (asymptotic) unbiasedness. An extensive simulation study verifies our findings.

Statistically Valid Variable Importance Assessment through Conditional Permutations
Ahmad Chamma, Denis A. Engemann, and Bertrand ThirionSep 14, 2023

Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip side, these approaches risk misidentifying unimportant variables as important in the presence of correlations among covariates. Here we develop a systematic approach for studying Conditional Permutation Importance (CPI) that is model agnostic and computationally lean, as well as reusable benchmarks of state-of-the-art variable importance estimators. We show theoretically and empirically that $\textit{CPI}$ overcomes the limitations of standard permutation importance by providing accurate type-I error control. When used with a deep neural network, $\textit{CPI}$ consistently showed top accuracy across benchmarks. An experiment on real-world data analysis in a large-scale medical dataset showed that $\textit{CPI}$ provides a more parsimonious selection of statistically significant variables. Our results suggest that $\textit{CPI}$ can be readily used as drop-in replacement for permutation-based methods.

Software Packages

random-forest-importances
Mar 22, 2018

Code to compute permutation and drop-column importances in Python scikit-learn models

Documentations

eli5.permutation_importance — ELI5 0.15.0 documentation
Eli5 DevelopersJan 1, 2016
Permutation Importance — PermutationImportance 1.2.1.5 ...
Permutationimportance DevelopersJan 1, 2019

Tags

Explainability Dimensions

Attribution Methods:
Explanation Target:
Explanatory Scope:

Other Categories

Data Requirements:
Data Type:
Evidence Type:
Expertise Needed:
Lifecycle Stage:
Technique Type:
Applicable Models: