Description

RuleFit creates interpretable surrogate models that can explain complex black-box models or serve as interpretable alternatives. It works by learning a sparse linear model that combines automatically extracted decision rules with original features. The technique first builds tree ensembles to generate candidate rules, then uses LASSO regression to select the most important rules and features. The resulting model provides global explanations through human-readable rules (e.g., 'IF age > 50 AND income < 30k THEN ...') combined with linear feature weights, making complex model behaviour transparent and auditable.

Example Use Cases

Explainability

Building customer churn prediction models with rules like 'IF contract_length < 12_months AND support_calls > 5 THEN churn_risk = high', allowing marketing teams to understand and act on the key drivers of customer attrition.

Creating credit scoring models that combine traditional linear factors (income, age) with interpretable rules (IF recent_missed_payments = 0 AND account_age > 2_years THEN creditworthy), providing transparent lending decisions.

Transparency

Developing regulatory-compliant medical diagnosis models where treatment recommendations combine clinical measurements with clear decision rules (IF blood_pressure > 140 AND diabetes = true THEN high_risk), enabling audit trails for healthcare decisions.

Limitations

  • Can generate large numbers of rules even with regularisation, potentially overwhelming users and reducing practical interpretability.
  • Performance may be inferior to complex ensemble methods when rule complexity is constrained for interpretability.
  • Rule extraction quality depends heavily on the underlying tree ensemble, which may miss important feature interactions if not properly configured.
  • Requires careful hyperparameter tuning to balance between model complexity and interpretability, with no universal optimal setting.

Resources

Research Papers

Tree Ensembles with Rule Structured Horseshoe Regularization
Malte Nalenz and Mattias VillaniFeb 16, 2017

We propose a new Bayesian model for flexible nonlinear regression and classification using tree ensembles. The model is based on the RuleFit approach in Friedman and Popescu (2008) where rules from decision trees and linear terms are used in a L1-regularized regression. We modify RuleFit by replacing the L1-regularization by a horseshoe prior, which is well known to give aggressive shrinkage of noise predictor while leaving the important signal essentially untouched. This is especially important when a large number of rules are used as predictors as many of them only contribute noise. Our horseshoe prior has an additional hierarchical layer that applies more shrinkage a priori to rules with a large number of splits, and to rules that are only satisfied by a few observations. The aggressive noise shrinkage of our prior also makes it possible to complement the rules from boosting in Friedman and Popescu (2008) with an additional set of trees from random forest, which brings a desirable diversity to the ensemble. We sample from the posterior distribution using a very efficient and easily implemented Gibbs sampler. The new model is shown to outperform state-of-the-art methods like RuleFit, BART and random forest on 16 datasets. The model and its interpretation is demonstrated on the well known Boston housing data, and on gene expression data for cancer classification. The posterior sampling, prediction and graphical tools for interpreting the model results are implemented in a publicly available R package.

Safe RuleFit: Learning Optimal Sparse Rule Model by Meta Safe Screening
Hiroki Kato, Hiroyuki Hanada, and Ichiro TakeuchiOct 3, 2018

We consider the problem of learning a sparse rule model, a prediction model in the form of a sparse linear combination of rules, where a rule is an indicator function defined over a hyper-rectangle in the input space. Since the number of all possible such rules is extremely large, it has been computationally intractable to select the optimal set of active rules. In this paper, to solve this difficulty for learning the optimal sparse rule model, we propose Safe RuleFit (SRF). Our basic idea is to develop meta safe screening (mSS), which is a non-trivial extension of well-known safe screening (SS) techniques. While SS is used for screening out one feature, mSS can be used for screening out multiple features by exploiting the inclusion-relations of hyper-rectangles in the input space. SRF provides a general framework for fitting sparse rule models for regression and classification, and it can be extended to handle more general sparse regularizations such as group regularization. We demonstrate the advantages of SRF through intensive numerical experiments.

Software Packages

rulefit
Oct 16, 2015

Python implementation of the rulefit algorithm

imodels
Jul 4, 2019

Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).

Tutorials

Getting More From Regression Models with RuleFit | Towards Data ...
Casey WhortonNov 28, 2020

Tags

Explainability Dimensions

Explanatory Scope:

Other Categories

Data Requirements:
Data Type:
Evidence Type:
Expertise Needed:
Lifecycle Stage:
Technique Type: