21 Demo: Simpson’s Paradox
.:rtemis 0.78.9003: Welcome, egenn [x86_64-apple-darwin15.6.0 (64-bit): Defaulting to 4/4 available cores] Need help? Online documentation & vignettes: https://rtemis.netlify.com
We are given a dataset of just two variables and are asked to explore it. Now, suppose we are quite new to this. We might start by looking whether the two variables are correlated:
We see that they are negatively correlated. We always hear people talking about p-values, so after a quick google search, we return to get a p-value for our correlation.
Pearson's product-moment correlation data: x and y t = -7.0313, df = 198, p-value = 3.242e-11 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.5515318 -0.3286346 sample estimates: cor -0.4469948
The correlation is highly significant!
What have we done wrong so far?
We should always begin by plotting data, if possible.
So let’s have a look. Let’s look at each variable by itself, first.
We don’t see much on the univariate plots. Or maybe we do. That first one looks like there’s two separate groups of points. Let’s look at the density plot of x:
Clearly, we want to plot them against each other.
OK - well, so what was that super significant negative correlation?
Let’s plot and draw the linear fit:
Well, sure, but that looks wrong.
Maybe we can cluster the dataset and plot again, grouping by cluster:
[2019-06-29 00:47:53 u.KMEANS] Hello, egenn [2019-06-29 00:47:53 u.KMEANS] Performing K-means Clustering with k = 2... [2019-06-29 00:47:53 u.KMEANS] Run completed in 4.8e-03 minutes (Real: 0.29; User: 0.23; System: 0.02)
We could have done this all with a single mplot3.xy command too:
Ok, now let’s draw separate fit lines. If you specify a grouping variable and request a fit, mplot3.xy will draw separate lines for each group:
That’s pretty nice.
Now we suddenly get all fancy and want to color our points by their distance to the cluster centroid and play around with colors in mplot3.xy:
Simpson’s Paradox is often encountered in biomedical and social sciences. Read more about it here.