# 18 Regression with Splines

  .:rtemis 0.79: Welcome, egenn
[x86_64-apple-darwin15.6.0 (64-bit): Defaulting to 4/4 available cores]
Online documentation & vignettes: https://rtemis.netlify.com
library(splines)
library(splines2)

## 18.1 Synthetic data

Let’s create some synthetic data:

set.seed = 2018
x <- rnorm(500)
y <- x ^ 3 + 4 + rnorm(500)

## 18.2 GLM

Let’s regress y on x:

mod.glm <- s.GLM(x, y)
[2019-08-02 17:31:44 s.GLM] Hello, egenn

[[ Regression Input Summary ]]
Training features: 500 x 1
Training outcome: 500 x 1
Testing features: Not available
Testing outcome: Not available

[2019-08-02 17:31:46 s.GLM] Training GLM...

[[ GLM Regression Training Summary ]]
MSE = 6.02 (59.53%)
RMSE = 2.45 (36.38%)
MAE = 1.66 (20.02%)
r = 0.77 (p = 7e-100)
rho = 0.68 (p = 0)
R sq = 0.60 [2019-08-02 17:31:46 s.GLM] Run completed in 0.02 minutes (Real: 1.46; User: 1.10; System: 0.11)



As expected, this is a bad fit.

## 18.3 B-splines

Let’s build B-splines for x and their first derivatives and plot them against x:

x.bs <- bSpline(x, 3)
dx.bs <- deriv(x.bs)
mplot3.xy(x, x.bs, type = 'l', lwd = 3) mplot3.xy(x, dx.bs, type = 'l', lwd = 3) Now let’s regress y on the B-splines we built from x

mod.glm.bs <- s.GLM(x.bs, y)
[2019-08-02 17:31:46 s.GLM] Hello, egenn

[[ Regression Input Summary ]]
Training features: 500 x 3
Training outcome: 500 x 1
Testing features: Not available
Testing outcome: Not available

[2019-08-02 17:31:46 s.GLM] Training GLM...

[[ GLM Regression Training Summary ]]
MSE = 1.16 (92.19%)
RMSE = 1.08 (72.06%)
MAE = 0.86 (58.77%)
r = 0.96 (p = 5.9e-278)
rho = 0.68 (p = 0)
R sq = 0.92 [2019-08-02 17:31:46 s.GLM] Run completed in 2.4e-03 minutes (Real: 0.15; User: 0.09; System: 0.01)



We get a much better fit by regressing y on the b-splines of x.

## 18.4 C-splines

Let’s build C-splines for x and their first derivatives and plot them against x:

x.cs <- cSpline(x, 3)
dx.cs <- deriv(x.cs)
mplot3.xy(x, x.cs, type = 'l', lwd = 3) mplot3.xy(x, dx.cs, type = 'l', lwd = 3) Now let’s regress y on the C-splines we built from x

mod.glm.cs <- s.GLM(x.cs, y)
[2019-08-02 17:31:47 s.GLM] Hello, egenn

[[ Regression Input Summary ]]
Training features: 500 x 3
Training outcome: 500 x 1
Testing features: Not available
Testing outcome: Not available

[2019-08-02 17:31:47 s.GLM] Training GLM...

[[ GLM Regression Training Summary ]]
MSE = 3.58 (75.90%)
RMSE = 1.89 (50.91%)
MAE = 1.25 (40.17%)
r = 0.87 (p = 5.4e-156)
rho = 0.68 (p = 0)
R sq = 0.76 [2019-08-02 17:31:47 s.GLM] Run completed in 3.9e-03 minutes (Real: 0.23; User: 0.11; System: 0.02)



## 18.5 I-splines

Let’s build I-splines for x and their first derivatives and plot them against x:

x.is <- iSpline(x, 3)
dx.is <- deriv(x.is)
mplot3.xy(x, x.is, type = 'l', lwd = 3) mplot3.xy(x, dx.is, type = 'l', lwd = 3) Now let’s regress y on the I-splines we built from x

mod.glm.is <- s.GLM(x.is, y)
[2019-08-02 17:31:48 s.GLM] Hello, egenn

[[ Regression Input Summary ]]
Training features: 500 x 3
Training outcome: 500 x 1
Testing features: Not available
Testing outcome: Not available

[2019-08-02 17:31:48 s.GLM] Training GLM...

[[ GLM Regression Training Summary ]]
MSE = 2.37 (84.04%)
RMSE = 1.54 (60.04%)
MAE = 1.08 (47.95%)
r = 0.92 (p = 1.5e-200)
rho = 0.68 (p = 0)
R sq = 0.84 [2019-08-02 17:31:48 s.GLM] Run completed in 2.7e-03 minutes (Real: 0.16; User: 0.08; System: 0.01)



## 18.6 M-splines

Let’s build M-splines for x and their first derivatives and plot them against x:

x.ms <- mSpline(x, 3)
dx.ms <- deriv(x.ms)
mplot3.xy(x, x.ms, type = 'l', lwd = 3) mplot3.xy(x, dx.ms, type = 'l', lwd = 3) Now let’s regress y on the M-splines we built

mod.glm.ms <- s.GLM(x.ms, y)
[2019-08-02 17:31:48 s.GLM] Hello, egenn

[[ Regression Input Summary ]]
Training features: 500 x 3
Training outcome: 500 x 1
Testing features: Not available
Testing outcome: Not available

[2019-08-02 17:31:48 s.GLM] Training GLM...

[[ GLM Regression Training Summary ]]
MSE = 1.16 (92.19%)
RMSE = 1.08 (72.06%)
MAE = 0.86 (58.77%)
r = 0.96 (p = 5.9e-278)
rho = 0.68 (p = 0)
R sq = 0.92 [2019-08-02 17:31:49 s.GLM] Run completed in 2.4e-03 minutes (Real: 0.14; User: 0.09; System: 0.01)



## 18.7 Natural cubic splines

Let’s build natural cubic splines for x and plot them against x:

x.ns <- ns(x, 3)
mplot3.xy(x, x.ns, type = 'l', lwd = 3) Now let’s regress y on the natural cubic splines we built

mod.glm.ns <- s.GLM(x.ns, y)
[2019-08-02 17:31:49 s.GLM] Hello, egenn

[[ Regression Input Summary ]]
Training features: 500 x 3
Training outcome: 500 x 1
Testing features: Not available
Testing outcome: Not available

[2019-08-02 17:31:49 s.GLM] Training GLM...

[[ GLM Regression Training Summary ]]
MSE = 1.70 (88.58%)
RMSE = 1.30 (66.21%)
MAE = 1.00 (51.79%)
r = 0.94 (p = 8.7e-237)
rho = 0.62 (p = 0)
R sq = 0.89 [2019-08-02 17:31:49 s.GLM] Run completed in 2.6e-03 minutes (Real: 0.16; User: 0.09; System: 0.01)