Mean Decrease Impurity
Description
Mean Decrease Impurity (MDI) quantifies a feature's importance in tree-based models (e.g., Random Forests, Gradient Boosting Machines) by measuring the total reduction in impurity (e.g., Gini impurity, entropy) across all splits where the feature is used. Features that lead to larger, more consistent reductions in impurity are considered more important, indicating their effectiveness in creating homogeneous child nodes and improving predictive accuracy.
Example Use Cases
Explainability
Determining the most influential genetic markers in a decision tree model predicting disease susceptibility, by identifying which markers consistently lead to the purest splits between healthy and diseased patient groups.
Assessing the key factors driving customer purchasing decisions in an e-commerce random forest model, revealing which product attributes or customer demographics are most effective in segmenting buyers.
Limitations
- MDI is inherently biased towards features with more unique values or those that allow for more splits, potentially overestimating their true importance.
- It is only applicable to tree-based models and cannot be directly used with other model architectures.
- The importance scores can be unstable, varying significantly with small changes in the training data or model parameters.
- MDI does not account for feature interactions, meaning it might not accurately reflect the importance of features that are only relevant when combined with others.