19 Building more accurate Decision Trees with The Additive Tree

  .:rtemis 0.79: Welcome, egenn
  [x86_64-apple-darwin15.6.0 (64-bit): Defaulting to 4/4 available cores]
  Online documentation & vignettes: https://rtemis.netlify.com

The Additive Tree walks like CART, but learns like Gradient Boosting. In other words, it is an algorithm that builds a single decision tree, similar to CART, but the training is similar to boosting stumps (a stump is a tree of depth 1). This results in increased accuracy without sacrificing interpretability (Luna et al. 2019). As with all supervised learning functions in rtemis, you can either provide a feature matrix / data frame, x, and an outcome vector, y, separately, or provide a combined dataset x alone, in which case the last column should be the outcome.
For classification, the outcome should be a factor where the first level is the ‘positive’ case.

19.1 Train AddTree

Let’s load a dataset from the UCI ML repository:

  • We convert the outcome variable “status” to a factor,
  • move it to the last column,
  • and set levels appropriately
  • We then use the checkData function to examine the dataset
  Dataset: parkinsons 

  [[ Summary ]]
  195 cases with 23 features: 
  * 22 continuous features 
  * 0 integer features 
  * 1 categorical feature, which is not ordered
  * 0 constant features 
  * 0 features include 'NA' values

  [[ Recommendations ]]
  * Everything looks good

Let’s train an Additive Tree model on the full sample:

[2019-08-02 17:49:52 s.ADDTREE] Hello, egenn 

[2019-08-02 17:49:54 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 195 x 22 
    Training outcome: 195 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2019-08-02 17:49:55 s.ADDTREE] Training ADDTREE... 

[[ ADDTREE Classification Training Summary ]]
                   Reference 
        Estimated  1    0   
                1  145   1
                0    2  47

                   Overall  
      Sensitivity  0.9864 
      Specificity  0.9792 
Balanced Accuracy  0.9828 
              PPV  0.9932 
              NPV  0.9592 
               F1  0.9898 
         Accuracy  0.9846 

  Positive Class:  1 
[2019-08-02 17:49:55 s.ADDTREE] Traversing tree by preorder... 
[2019-08-02 17:49:55 s.ADDTREE] Converting paths to rules... 
[2019-08-02 17:49:55 s.ADDTREE] Converting to data.tree object... 
[2019-08-02 17:49:55 s.ADDTREE] Pruning tree... 

[2019-08-02 17:49:56 s.ADDTREE] Run completed in 0.06 minutes (Real: 3.55; User: 2.66; System: 0.18) 

19.1.1 Plot AddTree

AddTree trees are saved as data.tree objects. We can plot them using mplot3.addtree, which creates html output using graphviz.
The first line shows the rule, followed by the N of samples that match the rule, and lastly by the percent of the above that were outcome positive.
By default, leaf nodes with an estimate of 1 (positive class) are orange, and those with estimate 0 are teal.
You can mouse over nodes, edges, and the plot background for some popup info. (The font size in this html render may appear slightly larger for the given box size; it renders correctly in RStudio)

19.1.3 Predict

To get predicted values, use the predict S3 generic with the familiar syntax
predict(mod, newdata). If newdata is not supplied, it returns the training set predictions (which we call the ‘fitted’ values):

  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0
 [36] 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1
 [71] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[106] 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[141] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
[176] 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0
Levels: 1 0

19.1.4 Training and testing

  • Create resamples of our data
  • Visualize them (white is testing, teal is training)
  • Split data to train and test sets
[2019-08-02 17:49:58 resample] Input contains more than one columns; will stratify on last 

[[ Resampling Parameters ]]
    n.resamples: 10 
      resampler: kfold 
   stratify.var: y 
   strat.n.bins: 4 
[2019-08-02 17:49:58 kfold] Using max n bins possible =  2 
[2019-08-02 17:49:58 resample] Created 10 independent folds 

[2019-08-02 17:49:58 s.ADDTREE] Hello, egenn 

[2019-08-02 17:49:58 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 175 x 22 
    Training outcome: 175 x 1 
    Testing features: 20 x 22 
     Testing outcome: 20 x 1 

[2019-08-02 17:49:58 s.ADDTREE] Training ADDTREE... 

[[ ADDTREE Classification Training Summary ]]
                   Reference 
        Estimated  1    0   
                1  129   1
                0    3  42

                   Overall  
      Sensitivity  0.9773 
      Specificity  0.9767 
Balanced Accuracy  0.9770 
              PPV  0.9923 
              NPV  0.9333 
               F1  0.9847 
         Accuracy  0.9771 

  Positive Class:  1 

[[ ADDTREE Classification Testing Summary ]]
                   Reference 
        Estimated  1   0  
                1  15  1
                0   0  4

                   Overall  
      Sensitivity  1.0000 
      Specificity  0.8000 
Balanced Accuracy  0.9000 
              PPV  0.9375 
              NPV  1.0000 
               F1  0.9677 
         Accuracy  0.9500 

  Positive Class:  1 
[2019-08-02 17:49:59 s.ADDTREE] Traversing tree by preorder... 
[2019-08-02 17:49:59 s.ADDTREE] Converting paths to rules... 
[2019-08-02 17:49:59 s.ADDTREE] Converting to data.tree object... 
[2019-08-02 17:50:00 s.ADDTREE] Pruning tree... 

[2019-08-02 17:50:00 s.ADDTREE] Run completed in 0.02 minutes (Real: 1.48; User: 1.06; System: 0.04) 

19.1.5 Hyperparameter tuning

rtemis supervised learners, like s.ADDTREE, support automatic hyperparameter tuning. When more than a single value is passed to a tunable argument, grid search with internal resampling takes place using all available cores (threads).

[2019-08-02 17:50:01 s.ADDTREE] Hello, egenn 

[2019-08-02 17:50:01 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 175 x 22 
    Training outcome: 175 x 1 
    Testing features: 20 x 22 
     Testing outcome: 20 x 1 

[2019-08-02 17:50:01 gridSearchLearn] Running grid search... 

[[ Resampling Parameters ]]
    n.resamples: 5 
      resampler: kfold 
   stratify.var: y 
   strat.n.bins: 4 
[2019-08-02 17:50:01 kfold] Using max n bins possible =  2 
[2019-08-02 17:50:01 resample] Created 5 independent folds 

[[ Search parameters ]]
    grid.params:  
                         gamma: 0.6, 0.7, 0.8, 0.9 
                     max.depth: 30 
                 learning.rate: 0.1 
                   min.hessian: 0.001 
   fixed.params:  
                 catPredictors: NULL 
                           ipw: TRUE 
                      ipw.type: 2 
                      upsample: FALSE 
                 resample.seed: NULL 
[2019-08-02 17:50:01 gridSearchLearn] Tuning Additive Tree by exhaustive grid search: 
[2019-08-02 17:50:01 gridSearchLearn] 5 resamples; 20 models total; running on 4 cores (x86_64-apple-darwin15.6.0)
 

[[ Best parameters to maximize Balanced Accuracy ]]
   best.tune:  
                      gamma: 0.7 
                  max.depth: 30 
              learning.rate: 0.1 
                min.hessian: 0.001 

[2019-08-02 17:50:11 gridSearchLearn] Run completed in 0.17 minutes (Real: 9.90; User: 0.04; System: 0.05) 

[2019-08-02 17:50:11 s.ADDTREE] Training ADDTREE... 

[[ ADDTREE Classification Training Summary ]]
                   Reference 
        Estimated  1    0   
                1  128   2
                0    4  41

                   Overall  
      Sensitivity  0.9697 
      Specificity  0.9535 
Balanced Accuracy  0.9616 
              PPV  0.9846 
              NPV  0.9111 
               F1  0.9771 
         Accuracy  0.9657 

  Positive Class:  1 

[[ ADDTREE Classification Testing Summary ]]
                   Reference 
        Estimated  1   0  
                1  14  3
                0   1  2

                   Overall  
      Sensitivity  0.9333 
      Specificity  0.4000 
Balanced Accuracy  0.6667 
              PPV  0.8235 
              NPV  0.6667 
               F1  0.8750 
         Accuracy  0.8000 

  Positive Class:  1 
[2019-08-02 17:50:12 s.ADDTREE] Traversing tree by preorder... 
[2019-08-02 17:50:12 s.ADDTREE] Converting paths to rules... 
[2019-08-02 17:50:12 s.ADDTREE] Converting to data.tree object... 
[2019-08-02 17:50:12 s.ADDTREE] Pruning tree... 

[2019-08-02 17:50:12 s.ADDTREE] Run completed in 0.19 minutes (Real: 11.12; User: 1.12; System: 0.11) 

We can define tuning resampling parameters with the grid.resampler.rtSet. The rtset.resample convenience function helps easily build the list needed by grid.resampler.rtset, providing auto-completion.

[2019-08-02 17:50:13 s.ADDTREE] Hello, egenn 

[2019-08-02 17:50:13 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 175 x 22 
    Training outcome: 175 x 1 
    Testing features: 20 x 22 
     Testing outcome: 20 x 1 

[2019-08-02 17:50:13 gridSearchLearn] Running grid search... 

[[ Resampling Parameters ]]
    n.resamples: 5 
      resampler: kfold 
   stratify.var: y 
   strat.n.bins: 4 
[2019-08-02 17:50:13 kfold] Using max n bins possible =  2 
[2019-08-02 17:50:13 resample] Created 5 independent folds 

[[ Search parameters ]]
    grid.params:  
                         gamma: 0.6, 0.7, 0.8, 0.9 
                     max.depth: 30 
                 learning.rate: 0.1 
                   min.hessian: 0.001 
   fixed.params:  
                 catPredictors: NULL 
                           ipw: TRUE 
                      ipw.type: 2 
                      upsample: FALSE 
                 resample.seed: NULL 
[2019-08-02 17:50:13 gridSearchLearn] Tuning Additive Tree by exhaustive grid search: 
[2019-08-02 17:50:13 gridSearchLearn] 5 resamples; 20 models total; running on 4 cores (x86_64-apple-darwin15.6.0)
 

[[ Best parameters to maximize Balanced Accuracy ]]
   best.tune:  
                      gamma: 0.9 
                  max.depth: 30 
              learning.rate: 0.1 
                min.hessian: 0.001 

[2019-08-02 17:50:25 gridSearchLearn] Run completed in 0.19 minutes (Real: 11.67; User: 0.02; System: 0.02) 

[2019-08-02 17:50:25 s.ADDTREE] Training ADDTREE... 

[[ ADDTREE Classification Training Summary ]]
                   Reference 
        Estimated  1    0   
                1  129   1
                0    3  42

                   Overall  
      Sensitivity  0.9773 
      Specificity  0.9767 
Balanced Accuracy  0.9770 
              PPV  0.9923 
              NPV  0.9333 
               F1  0.9847 
         Accuracy  0.9771 

  Positive Class:  1 

[[ ADDTREE Classification Testing Summary ]]
                   Reference 
        Estimated  1   0  
                1  15  1
                0   0  4

                   Overall  
      Sensitivity  1.0000 
      Specificity  0.8000 
Balanced Accuracy  0.9000 
              PPV  0.9375 
              NPV  1.0000 
               F1  0.9677 
         Accuracy  0.9500 

  Positive Class:  1 
[2019-08-02 17:50:26 s.ADDTREE] Traversing tree by preorder... 
[2019-08-02 17:50:26 s.ADDTREE] Converting paths to rules... 
[2019-08-02 17:50:26 s.ADDTREE] Converting to data.tree object... 
[2019-08-02 17:50:26 s.ADDTREE] Pruning tree... 

[2019-08-02 17:50:26 s.ADDTREE] Run completed in 0.21 minutes (Real: 12.61; User: 0.88; System: 0.09) 

Let’s look at the tuning results (this is a small dataset and tuning may not be very accurate):

  gamma max.depth learning.rate min.hessian Sensitivity Specificity
1   0.6        30           0.1       0.001   0.8629630   0.7000000
2   0.7        30           0.1       0.001   0.8931624   0.6277778
3   0.8        30           0.1       0.001   0.8860399   0.6944444
4   0.9        30           0.1       0.001   0.8937322   0.6944444
  Balanced Accuracy       PPV       NPV        F1  Accuracy param.id
1         0.7814815 0.8971339 0.6310101 0.8794356 0.8224463        1
2         0.7604701 0.8829769 0.6350000 0.8873931 0.8273389        2
3         0.7902422 0.9002621 0.6621212 0.8928775 0.8392530        3
4         0.7940883 0.9008319 0.6835498 0.8969414 0.8451354        4

19.1.6 Nested resampling: Cross-validation and hyperparameter tuning

We now use the core rtemis supervised learning function elevate to use nested resampling for cross-validation and hyperparameter tuning:

[2019-08-02 17:50:27 elevate] Hello, egenn 

[[ Classification Input Summary ]]
   Training features: 195 x 22 
    Training outcome: 195 x 1 

[2019-08-02 17:50:27 resLearn] Training Additive Tree on 10 stratified subsamples... 
[2019-08-02 17:50:27 kfold] Using max n bins possible =  2 
[2019-08-02 17:50:49 kfold] Using max n bins possible =  2 
[2019-08-02 17:51:09 kfold] Using max n bins possible =  2 
[2019-08-02 17:51:28 kfold] Using max n bins possible =  2 
[2019-08-02 17:51:46 kfold] Using max n bins possible =  2 
[2019-08-02 17:52:04 kfold] Using max n bins possible =  2 
[2019-08-02 17:52:28 kfold] Using max n bins possible =  2 
[2019-08-02 17:52:59 kfold] Using max n bins possible =  2 
[2019-08-02 17:53:20 kfold] Using max n bins possible =  2 
[2019-08-02 17:53:42 kfold] Using max n bins possible =  2 

[[ elevate ADDTREE ]]
   N repeats = 1 
   N resamples = 10 
   Resampler = strat.sub 
   Mean Balanced Accuracy of 10 test sets in each repeat = 0.80

[2019-08-02 17:54:01 elevate] Run completed in 3.57 minutes (Real: 214.38; User: 23.37; System: 1.38) 

We can get a summary of the cross-validation by printing the elevate object:

.:rtemis Cross-Validated Model
ADDTREE (Additive Tree)
                 Algorithm: ADDTREE (Additive Tree)
                Resampling: n = 10, type = strat.sub
              N of repeats: 1 
 Average Balanced Accuracy across repeats = 

19.2 Bagging the Additive Tree (Addtree Random Forest)

You can use rtemisbag function to build a random forest with AddTree base learners.

[2020-03-25 23:03:55 resample] Input contains more than one columns; will stratify on last 

[[ Resampling Parameters ]]
    n.resamples: 10 
      resampler: strat.sub 
   stratify.var: y 
        train.p: 0.75 
   strat.n.bins: 4 
[2020-03-25 23:03:55 strat.sub] Using max n bins possible = 2 
[2020-03-25 23:03:55 resample] Created 10 stratified subsamples 
[2020-03-25 23:03:55 s.ADDTREE] Hello, egenn 

[2020-03-25 23:03:55 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 155 x 60 
    Training outcome: 155 x 1 
    Testing features: 53 x 60 
     Testing outcome: 53 x 1 

[2020-03-25 23:03:56 s.ADDTREE] Training ADDTREE... 

[[ ADDTREE Classification Training Summary ]]
                   Reference 
        Estimated  M   R   
                M  82   2
                R   1  70

                   Overall  
      Sensitivity  0.9880 
      Specificity  0.9722 
Balanced Accuracy  0.9801 
              PPV  0.9762 
              NPV  0.9859 
               F1  0.9820 
         Accuracy  0.9806 

  Positive Class:  M 

[[ ADDTREE Classification Testing Summary ]]
                   Reference 
        Estimated  M   R   
                M  19   4
                R   9  21

                   Overall  
      Sensitivity  0.6786 
      Specificity  0.8400 
Balanced Accuracy  0.7593 
              PPV  0.8261 
              NPV  0.7000 
               F1  0.7451 
         Accuracy  0.7547 

  Positive Class:  M 
[2020-03-25 23:03:58 s.ADDTREE] Traversing tree by preorder... 
[2020-03-25 23:03:58 s.ADDTREE] Converting paths to rules... 
[2020-03-25 23:03:58 s.ADDTREE] Converting to data.tree object... 
[2020-03-25 23:03:58 s.ADDTREE] Pruning tree... 


[2020-03-25 23:03:58 s.ADDTREE] Run completed in 0.05 minutes (Real: 3.11; User: 2.89; System: 0.12) 

Let’s train a random forest using AddTree base learner (with just 20 trees for this example)

[2020-03-25 23:28:43 bag] Hello, egenn 

[2020-03-25 23:28:43 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 155 x 60 
    Training outcome: 155 x 1 
    Testing features: 53 x 60 
     Testing outcome: 53 x 1 

[[ Parameters ]]
          mod: ADDTREE 
   mod.params: (empty list) 
[2020-03-25 23:28:44 bag] Bagging 20 Additive Tree... 
[2020-03-25 23:28:44 strat.sub] Using max n bins possible = 2 

[2020-03-25 23:28:44 resLearn] Training Additive Tree on 20 stratified bootstraps... 
[2020-03-25 23:28:44 resLearn] Parallelizing by forking on 4 cores... 

[[ Classification Training Summary ]]
                   Reference 
        Estimated  M   R   
                M  82   1
                R   1  71

                   Overall  
      Sensitivity  0.9880 
      Specificity  0.9861 
Balanced Accuracy  0.9870 
              PPV  0.9880 
              NPV  0.9861 
               F1  0.9880 
         Accuracy  0.9871 

  Positive Class:  M 

[[ Classification Testing Summary ]]
                   Reference 
        Estimated  M   R   
                M  21   4
                R   7  21

                   Overall  
      Sensitivity  0.7500 
      Specificity  0.8400 
Balanced Accuracy  0.7950 
              PPV  0.8400 
              NPV  0.7500 
               F1  0.7925 
         Accuracy  0.7925 

  Positive Class:  M 


[2020-03-25 23:29:06 bag] Run completed in 0.39 minutes (Real: 23.33; User: 3.52; System: 0.54) 

19.3 More example datasets

19.3.1 OpenML: sleep

Let’s grab a dataset from the massive OpenML repository.
(We can read the .arff files as CSVs)

  Dataset: sleep 

  [[ Summary ]]
  62 cases with 8 features: 
  * 4 continuous features 
  * 3 integer features 
  * 1 categorical feature, which is not ordered
  * 0 constant features 
  * 0 duplicated cases 
  * 2 features include 'NA' values; 8 'NA' values total
    ** Max percent missing in a feature is 6.45% (max_life_span)
    ** Max percent missing in a case is 25% (case #13)

  [[ Recommendations ]]
  * Consider imputing missing values or use complete cases only
  * Check the 3 integer features and consider if they should be converted to factors

We can impute missing data with preprocess:

[2019-08-02 17:55:00 preprocess] Imputing missing values using missRanger... 

Missing value imputation by random forests

  Variables to impute:      max_life_span, gestation_time
  Variables used to impute: body_weight, brain_weight, max_life_span, gestation_time, predation_index, sleep_exposure_index, danger_index, binaryClass
iter 1: ..
iter 2: ..
iter 3: ..
iter 4: ..
[2019-08-02 17:55:00 preprocess] Done 

Train and plot AddTree:

[2019-08-02 17:55:00 s.ADDTREE] Hello, egenn 

[2019-08-02 17:55:00 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 62 x 7 
    Training outcome: 62 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2019-08-02 17:55:00 s.ADDTREE] Training ADDTREE... 

[[ ADDTREE Classification Training Summary ]]
                   Reference 
        Estimated  N   P   
                N  31   2
                P   2  27

                   Overall  
      Sensitivity  0.9394 
      Specificity  0.9310 
Balanced Accuracy  0.9352 
              PPV  0.9394 
              NPV  0.9310 
               F1  0.9394 
         Accuracy  0.9355 

  Positive Class:  N 
[2019-08-02 17:55:01 s.ADDTREE] Traversing tree by preorder... 
[2019-08-02 17:55:01 s.ADDTREE] Converting paths to rules... 
[2019-08-02 17:55:01 s.ADDTREE] Converting to data.tree object... 
[2019-08-02 17:55:01 s.ADDTREE] Pruning tree... 

[2019-08-02 17:55:01 s.ADDTREE] Run completed in 0.02 minutes (Real: 1.03; User: 0.90; System: 0.04) 

19.3.2 PMLB: chess

Let’s load a dataset from the Penn ML Benchmarks github repository.
R allows us to read a gzipped file and unzip on the fly:

  • We open a remote connection to a gzipped tab-separated file,
  • read it in R with read.table,
  • set the target levels,
  • and check the data
  Dataset: chess 

  [[ Summary ]]
  3196 cases with 37 features: 
  * 0 continuous features 
  * 36 integer features 
  * 1 categorical feature, which is not ordered
  * 0 constant features 
  * 0 features include 'NA' values

  [[ Recommendations ]]
  * Everything looks good

[2019-08-02 17:55:04 s.ADDTREE] Hello, egenn 

[2019-08-02 17:55:04 dataPrepare] Imbalanced classes: using Inverse Probability Weighting 

[[ Classification Input Summary ]]
   Training features: 3196 x 36 
    Training outcome: 3196 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2019-08-02 17:55:04 s.ADDTREE] Training ADDTREE... 

[[ ADDTREE Classification Training Summary ]]
                   Reference 
        Estimated  1     0     
                1  1623    28
                0    46  1499

                   Overall  
      Sensitivity  0.9724 
      Specificity  0.9817 
Balanced Accuracy  0.9771 
              PPV  0.9830 
              NPV  0.9702 
               F1  0.9777 
         Accuracy  0.9768 

  Positive Class:  1 
[2019-08-02 17:55:10 s.ADDTREE] Traversing tree by preorder... 
[2019-08-02 17:55:10 s.ADDTREE] Converting paths to rules... 
[2019-08-02 17:55:10 s.ADDTREE] Converting to data.tree object... 
[2019-08-02 17:55:11 s.ADDTREE] Pruning tree... 

[2019-08-02 17:55:11 s.ADDTREE] Run completed in 0.12 minutes (Real: 7.14; User: 6.69; System: 0.27) 

References

Luna, José Marcio, Efstathios D Gennatas, Lyle H Ungar, Eric Eaton, Eric S Diffenderfer, Shane T Jensen, Charles B Simone, Jerome H Friedman, Timothy D Solberg, and Gilmer Valdes. 2019. “Building More Accurate Decision Trees with the Additive Tree.” Proceedings of the National Academy of Sciences 116 (40). National Acad Sciences: 19887–93.