16 RuleFit

RuleFit is a powerful algorithm for regression and classification, which uses gradient boosting and the LASSO to train a highly accurate and interpretable model.

Given a dataset X and an outcome y:

  1. Train a Gradient Boosting model on the raw inputs X to predict y
  2. Take all decision tree base learners from (1) and convert them to a list of rules R (by following all paths from root node to leaf node). The rules represent a transformation of the raw input features.
  3. Train a LASSO model on the ruleset R to predict y.

Thanks to the LASSO’s variable selection, step 3. will usually greatly reduce the large number of rules in R with no loss of accuracy. In fact, RuleFit may outperform gradient boosting.

RuleFit Summary

Figure 16.1: RuleFit Summary

16.1 Data

Let’s grab the Parkinsons dataset from the UCI repository:

  Dataset: parkinsons 

  [[ Summary ]]
  195 cases with 23 features: 
  * 22 continuous features 
  * 0 integer features 
  * 1 categorical feature, which is not ordered
  * 0 constant features 
  * 0 features include 'NA' values

  [[ Recommendations ]]
  * Everything looks good

16.2 RuleFeat

Since RuleFit is trademarked, the function is called s.RULEFEAT in rtemis.

[2019-08-02 17:31:08 s.RULEFEAT] Hello, egenn 
[2019-08-02 17:31:11 s.RULEFEAT] Running Gradient Boosting... 
[2019-08-02 17:31:11] Hello, egenn 

[2019-08-02 17:31:11 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 146 x 22 
    Training outcome: 146 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2019-08-02 17:31:11] Running Gradient Boosting Classification with a bernoulli loss function... 

[[ Parameters ]]
             n.trees: 100 
   interaction.depth: 5 
           shrinkage: 0.001 
        bag.fraction: 0.5 
           mFeatures: NULL 
      n.minobsinnode: 5 
             weights: NULL 

[2019-08-02 17:31:11] Training GBM3 on full training set... 
[2019-08-02 17:31:11] ###   Caught gbm.fit error; retrying...   ### 
Warning in (function (x, y = NULL, x.test = NULL, y.test = NULL, weights =
NULL, : Caught gbm.fit error: retraining last model and continuing
[2019-08-02 17:31:11] ###   Caught gbm.fit error; retrying...   ### 
Warning in (function (x, y = NULL, x.test = NULL, y.test = NULL, weights =
NULL, : Caught gbm.fit error: retraining last model and continuing
[2019-08-02 17:31:11] ###   Caught gbm.fit error; retrying...   ### 
Warning in (function (x, y = NULL, x.test = NULL, y.test = NULL, weights =
NULL, : Caught gbm.fit error: retraining last model and continuing

[[ GBM3 Classification Training Summary ]]
                   Reference 
        Estimated  1    0   
                1  109   1
                0    1  35

                   Overall  
      Sensitivity  0.9909 
      Specificity  0.9722 
Balanced Accuracy  0.9816 
              PPV  0.9909 
              NPV  0.9722 
               F1  0.9909 
         Accuracy  0.9863 
              AUC  0.9957 

  Positive Class:  1 
[2019-08-02 17:31:11] Calculating relative influence of variables... 

[2019-08-02 17:31:11] Run completed in 0.01 minutes (Real: 0.66; User: 0.45; System: 0.01) 
[2019-08-02 17:31:11 s.RULEFEAT] Collecting Gradient Boosting Rules (Trees)... 
600 rules (length<=5) were extracted from the first 100 trees.
[2019-08-02 17:31:12 s.RULEFEAT] Extracted 600 rules... 
[2019-08-02 17:31:12 s.RULEFEAT] ...and kept 583 unique rules 
[2019-08-02 17:31:12 matchCasesByRules] Matching 583 rules to 146 cases... 
[2019-08-02 17:31:12 s.RULEFEAT] Running LASSO on GBM rules... 
[2019-08-02 17:31:12] Hello, egenn 

[2019-08-02 17:31:12 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 146 x 583 
    Training outcome: 146 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2019-08-02 17:31:12 gridSearchLearn] Running grid search... 

[[ Resampling Parameters ]]
    n.resamples: 5 
      resampler: kfold 
   stratify.var: y 
   strat.n.bins: 4 
[2019-08-02 17:31:12 kfold] Using max n bins possible =  2 
[2019-08-02 17:31:12 resample] Created 5 independent folds 

[[ Search parameters ]]
    grid.params:  
                 alpha: 1 
   fixed.params:  
                             .gs: TRUE 
                 which.cv.lambda: lambda.1se 
[2019-08-02 17:31:12 gridSearchLearn] Tuning Elastic Net by exhaustive grid search: 
[2019-08-02 17:31:12 gridSearchLearn] 5 resamples; 5 models total; running on 4 cores (x86_64-apple-darwin15.6.0)
 

[[ Best parameters to maximize Balanced Accuracy ]]
   best.tune:  
              lambda: 0.0766964784937517 
               alpha: 1 

[2019-08-02 17:31:16 gridSearchLearn] Run completed in 0.06 minutes (Real: 3.83; User: 0.08; System: 0.04) 

[[ Parameters ]]
    alpha: 1 
   lambda: 0.0766964784937517 

[2019-08-02 17:31:16] Training elastic net model... 

[[ GLMNET Classification Training Summary ]]
                   Reference 
        Estimated  1    0   
                1  108   0
                0    2  36

                   Overall  
      Sensitivity  0.9818 
      Specificity  1.0000 
Balanced Accuracy  0.9909 
              PPV  1.0000 
              NPV  0.9474 
               F1  0.9908 
         Accuracy  0.9863 
              AUC  1.0000 

  Positive Class:  1 

[2019-08-02 17:31:16] Run completed in 0.07 minutes (Real: 4.04; User: 0.32; System: 0.06) 

[[ RULEFEAT Classification Training Summary ]]
                   Reference 
        Estimated  1    0   
                1  108   0
                0    2  36

                   Overall  
      Sensitivity  0.9818 
      Specificity  1.0000 
Balanced Accuracy  0.9909 
              PPV  1.0000 
              NPV  0.9474 
               F1  0.9908 
         Accuracy  0.9863 
              AUC  1.0000 

  Positive Class:  1 
[2019-08-02 17:31:17 predict.ruleFeat] Matching newdata to rules... 
[2019-08-02 17:31:17 matchCasesByRules] Matching 583 rules to 49 cases... 

[[ RULEFEAT Classification Testing Summary ]]
                   Reference 
        Estimated  1   0  
                1  34  5
                0   3  7

                   Overall  
      Sensitivity  0.9189 
      Specificity  0.5833 
Balanced Accuracy  0.7511 
              PPV  0.8718 
              NPV  0.7000 
               F1  0.8947 
         Accuracy  0.8367 
              AUC  0.8863 

  Positive Class:  1 

[2019-08-02 17:31:17 s.RULEFEAT] Run completed in 0.15 minutes (Real: 8.80; User: 3.86; System: 0.33) 

16.2.1 RuleFeat Output

Let’s explore the algorithm output:

16.2.2 R-readable rules

We can also access the R-readable rules directly:

 [1] "Shimmer.APQ3>0.00907 & PPE>0.178862"                                              
 [2] "spread1>-6.428419 & spread2>0.199116"                                             
 [3] "spread1>-5.618097"                                                                
 [4] "MDVP.Fhi.Hz.<=202.621 & spread1<=-6.1894575 & spread2<=0.21515"                   
 [5] "MDVP.Fhi.Hz.<=204.61 & MDVP.Fhi.Hz.>151.1155 & PPE<=0.1848235"                    
 [6] "Shimmer.APQ5<=0.01244 & MDVP.APQ<=0.017785 & Shimmer.DDA>0.02542 & PPE<=0.1808325"
 [7] "MDVP.Fhi.Hz.<=230.16 & DFA<=0.7283775 & PPE<=0.1634365"                           
 [8] "MDVP.Fhi.Hz.<=227.467 & DFA<=0.730486 & spread1<=-6.011769"                       
 [9] "MDVP.Fhi.Hz.<=209.9585 & DFA<=0.73104 & PPE<=0.184526"                            
[10] "MDVP.Fhi.Hz.<=209.9585 & MDVP.Jitter.Abs.>2.5e-05 & DFA>0.73104 & PPE<=0.184526"  
[11] "MDVP.Fhi.Hz.<=203.929 & MDVP.Fhi.Hz.>148.816 & PPE<=0.184526"                     
[12] "MDVP.Fo.Hz.<=192.848 & MDVP.Fhi.Hz.<=223.896 & D2>2.0310275"                      
[13] "MDVP.Fhi.Hz.>202.5025 & spread2<=0.1804995"                                       
[14] "MDVP.Fhi.Hz.>124.159 & PPE>0.1808325"                                             
[15] "D2>2.0536745 & PPE>0.150139"                                                      
[16] "MDVP.Fhi.Hz.<=204.61 & MDVP.Fhi.Hz.>147.238 & PPE<=0.1848675"                     
[17] "Shimmer.DDA>0.027625 & spread1>-6.428419"                                         
[18] "MDVP.Fo.Hz.>139.5125 & spread1<=-6.009603 & spread2>0.1786265"                    

16.2.3 Format rules

We can format the rules to a more human-readable format. Instead of using thresholds, as they are used in a decision tree, we can convert them to show the median (for continuous features) or mode (for categorical features) and range:

[2019-08-02 17:31:18 matchCasesByRules] Matching 18 rules to 146 cases... 
[2019-08-02 17:31:18 rules2medmod] Converting rules... 
[2019-08-02 17:31:18 rules2medmod] Done 
 [1] "Shimmer.APQ3 = 0.02 (0.01-0.06) & PPE = 0.26 (0.18-0.53)"                                                                        
 [2] "spread1 = -5.19 (-6.37--2.43) & spread2 = 0.26 (0.20-0.45)"                                                                      
 [3] "spread1 = -4.91 (-5.62--2.43)"                                                                                                   
 [4] "MDVP.Fhi.Hz. = 163.05 (123.72-198.35) & spread1 = -6.52 (-7.11--6.25) & spread2 = 0.18 (0.06-0.21)"                              
 [5] "MDVP.Fhi.Hz. = 163.43 (154.61-202.45) & PPE = 0.14 (0.09-0.18)"                                                                  
 [6] "Shimmer.APQ5 = 0.01 (0.01-0.01) & MDVP.APQ = 0.01 (0.01-0.02) & Shimmer.DDA = 0.03 (0.03-0.04) & PPE = 0.12 (0.08-0.17)"         
 [7] "MDVP.Fhi.Hz. = 161.47 (123.72-220.31) & DFA = 0.71 (0.65-0.73) & PPE = 0.14 (0.09-0.16)"                                         
 [8] "MDVP.Fhi.Hz. = 162.82 (123.72-220.31) & DFA = 0.71 (0.65-0.73) & spread1 = -6.47 (-7.11--6.11)"                                  
 [9] "MDVP.Fhi.Hz. = 161.47 (123.72-208.31) & DFA = 0.71 (0.65-0.73) & PPE = 0.14 (0.09-0.18)"                                         
[10] "MDVP.Fhi.Hz. = 129.54 (113.60-163.44) & MDVP.Jitter.Abs. = 3e-05 (3e-05-4e-05) & DFA = 0.76 (0.73-0.79) & PPE = 0.16 (0.10-0.18)"
[11] "MDVP.Fhi.Hz. = 163.43 (154.61-202.45) & PPE = 0.14 (0.09-0.18)"                                                                  
[12] "MDVP.Fo.Hz. = 129.34 (88.33-188.62) & MDVP.Fhi.Hz. = 157.76 (107.72-216.81) & D2 = 2.44 (2.03-3.41)"                             
[13] "MDVP.Fhi.Hz. = 243.71 (203.52-581.29) & spread2 = 0.13 (0.01-0.18)"                                                              
[14] "MDVP.Fhi.Hz. = 165.36 (124.39-586.57) & PPE = 0.24 (0.18-0.53)"                                                                  
[15] "D2 = 2.54 (2.06-3.67) & PPE = 0.24 (0.15-0.53)"                                                                                  
[16] "MDVP.Fhi.Hz. = 163.43 (154.61-202.45) & PPE = 0.14 (0.09-0.18)"                                                                  
[17] "Shimmer.DDA = 0.05 (0.03-0.17) & spread1 = -5.25 (-6.42--2.43)"                                                                  
[18] "MDVP.Fo.Hz. = 157.83 (148.79-208.52) & spread1 = -6.34 (-7.00--6.05) & spread2 = 0.21 (0.18-0.34)"