Mean Decrease Impurity

Description

Mean Decrease Impurity (MDI) quantifies a feature's importance in tree-based models (e.g., Random Forests, Gradient Boosting Machines) by measuring the total reduction in impurity (e.g., Gini impurity, entropy) across all splits where the feature is used. Features that lead to larger, more consistent reductions in impurity are considered more important, indicating their effectiveness in creating homogeneous child nodes and improving predictive accuracy.

Example Use Cases

Explainability

Determining the most influential genetic markers in a decision tree model predicting disease susceptibility, by identifying which markers consistently lead to the purest splits between healthy and diseased patient groups.

Assessing the key factors driving customer purchasing decisions in an e-commerce random forest model, revealing which product attributes or customer demographics are most effective in segmenting buyers.

Limitations

  • MDI is inherently biased towards features with more unique values or those that allow for more splits, potentially overestimating their true importance.
  • It is only applicable to tree-based models and cannot be directly used with other model architectures.
  • The importance scores can be unstable, varying significantly with small changes in the training data or model parameters.
  • MDI does not account for feature interactions, meaning it might not accurately reflect the importance of features that are only relevant when combined with others.

Resources

Trees, forests, and impurity-based variable importance
Research PaperErwan ScornetJan 13, 2020
A Debiased MDI Feature Importance Measure for Random Forests
Research PaperXiao Li et al.Jun 26, 2019
Variable Importance in Random Forests | Towards Data Science
Tutorial
Interpreting Deep Forest through Feature Contribution and MDI Feature Importance
Research PaperYi-Xiao He, Shen-Huan Lyu, and Yuan JiangMay 1, 2023
optuna.importance.MeanDecreaseImpurityImportanceEvaluator ...
Documentation

Tags

Applicable Models:
Data Requirements:
Data Type:
Evidence Type:
Expertise Needed:
Explanatory Scope:
Lifecycle Stage:
Technique Type: