18 Regression with Splines

  .:rtemis 0.79: Welcome, egenn
  [x86_64-apple-darwin15.6.0 (64-bit): Defaulting to 4/4 available cores]
  Online documentation & vignettes: https://rtemis.netlify.com

18.1 Synthetic data

Let’s create some synthetic data:

18.2 GLM

Let’s regress y on x:

[2019-08-02 17:31:44 s.GLM] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 500 x 1 
    Training outcome: 500 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2019-08-02 17:31:46 s.GLM] Training GLM... 

[[ GLM Regression Training Summary ]]
    MSE = 6.02 (59.53%)
   RMSE = 2.45 (36.38%)
    MAE = 1.66 (20.02%)
      r = 0.77 (p = 7e-100)
    rho = 0.68 (p = 0)
   R sq = 0.60

[2019-08-02 17:31:46 s.GLM] Run completed in 0.02 minutes (Real: 1.46; User: 1.10; System: 0.11) 

As expected, this is a bad fit.

18.3 B-splines

Let’s build B-splines for x and their first derivatives and plot them against x:

Now let’s regress y on the B-splines we built from x

[2019-08-02 17:31:46 s.GLM] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 500 x 3 
    Training outcome: 500 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2019-08-02 17:31:46 s.GLM] Training GLM... 

[[ GLM Regression Training Summary ]]
    MSE = 1.16 (92.19%)
   RMSE = 1.08 (72.06%)
    MAE = 0.86 (58.77%)
      r = 0.96 (p = 5.9e-278)
    rho = 0.68 (p = 0)
   R sq = 0.92

[2019-08-02 17:31:46 s.GLM] Run completed in 2.4e-03 minutes (Real: 0.15; User: 0.09; System: 0.01) 

We get a much better fit by regressing y on the b-splines of x.

18.4 C-splines

Let’s build C-splines for x and their first derivatives and plot them against x:

Now let’s regress y on the C-splines we built from x

[2019-08-02 17:31:47 s.GLM] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 500 x 3 
    Training outcome: 500 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2019-08-02 17:31:47 s.GLM] Training GLM... 

[[ GLM Regression Training Summary ]]
    MSE = 3.58 (75.90%)
   RMSE = 1.89 (50.91%)
    MAE = 1.25 (40.17%)
      r = 0.87 (p = 5.4e-156)
    rho = 0.68 (p = 0)
   R sq = 0.76

[2019-08-02 17:31:47 s.GLM] Run completed in 3.9e-03 minutes (Real: 0.23; User: 0.11; System: 0.02) 

18.5 I-splines

Let’s build I-splines for x and their first derivatives and plot them against x:

Now let’s regress y on the I-splines we built from x

[2019-08-02 17:31:48 s.GLM] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 500 x 3 
    Training outcome: 500 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2019-08-02 17:31:48 s.GLM] Training GLM... 

[[ GLM Regression Training Summary ]]
    MSE = 2.37 (84.04%)
   RMSE = 1.54 (60.04%)
    MAE = 1.08 (47.95%)
      r = 0.92 (p = 1.5e-200)
    rho = 0.68 (p = 0)
   R sq = 0.84

[2019-08-02 17:31:48 s.GLM] Run completed in 2.7e-03 minutes (Real: 0.16; User: 0.08; System: 0.01) 

18.6 M-splines

Let’s build M-splines for x and their first derivatives and plot them against x:

Now let’s regress y on the M-splines we built

[2019-08-02 17:31:48 s.GLM] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 500 x 3 
    Training outcome: 500 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2019-08-02 17:31:48 s.GLM] Training GLM... 

[[ GLM Regression Training Summary ]]
    MSE = 1.16 (92.19%)
   RMSE = 1.08 (72.06%)
    MAE = 0.86 (58.77%)
      r = 0.96 (p = 5.9e-278)
    rho = 0.68 (p = 0)
   R sq = 0.92

[2019-08-02 17:31:49 s.GLM] Run completed in 2.4e-03 minutes (Real: 0.14; User: 0.09; System: 0.01) 

18.7 Natural cubic splines

Let’s build natural cubic splines for x and plot them against x:

Now let’s regress y on the natural cubic splines we built

[2019-08-02 17:31:49 s.GLM] Hello, egenn 

[[ Regression Input Summary ]]
   Training features: 500 x 3 
    Training outcome: 500 x 1 
    Testing features: Not available
     Testing outcome: Not available

[2019-08-02 17:31:49 s.GLM] Training GLM... 

[[ GLM Regression Training Summary ]]
    MSE = 1.70 (88.58%)
   RMSE = 1.30 (66.21%)
    MAE = 1.00 (51.79%)
      r = 0.94 (p = 8.7e-237)
    rho = 0.62 (p = 0)
   R sq = 0.89

[2019-08-02 17:31:49 s.GLM] Run completed in 2.6e-03 minutes (Real: 0.16; User: 0.09; System: 0.01)