Social Scientists Need to Stop Talking About "Statistical Significance" and Bivariate Correlations
We're still miscommunicating a lot of rubbish nonsense to the public.
Over the past decade and a half, psychology has reckoned with the “replication crisis” wherein bad scholarly practices resulted in the promulgation of many false beliefs. The famous Stanford Prison Experiment is one such example, now under serious question, but many psychological hypotheses were sold to the general public as “true”, though they proved to be unreplicable with more rigorous methods.
One further concern, at least as bad I think, that psychology is going to have to reckon with is “crud.” Put a bit simply, crud are “statistically significant” effects, that are typically tiny in effect size and often due to methodological noise, not real effects. “Crud” can occur in studies designed to fix replication crisis issues, so rigor in that sense does not protect against crud. Crud is more likely to occur in large sample studies, since such studies have more “power” to detect tiny effects, whether they are real or crud. The problem for social science is that, even though we know “statistical significance” is a bad measuring stick for hypotheses, scholars tend to still use it as a.) it requires no actual thought, b.) it’s still harder to get non-significant papers published than significant, and c.) as human beings we tend to get attached to our little theories and quite defensive of them.
This is particularly a problem for meta-analyses, which, more than individual studies, are probably most at fault for exaggerating the confidence we should have in psychological theories. This is for two reasons. First, owing to massive power, almost all meta-analyses are “statistically significant” even when the evidence they consider is absurdly weak and, as such, promote “crud”. And second, unfortunately most meta-analyses, particularly of correlational studies, continue to use bivariate correlations even when there is no good reason to.
The assumption for meta-analyses was that bivariate correlations, as compared to standardized regression coefficients which control for other “third” variables, are more homogenous and, as such comparable. But that turns out not to be true. By contrast, bivariate correlations exaggerate our confidence in a hypothesis because they don’t include control variables, thus artificially inflating effect sizes.
Consider the hypothesis that eating peaches prevents lung disease (defined as any negative lung symptoms). In a number of large, epidemiological, correlational studies, we find that how many peaches a person eats per year correlates a tiny bit with reduced symptoms of lung disease. In a meta-analysis of these bivariate effects, it is concluded that the correlation is “statistically significant.” However, perhaps it is true that women eat more peaches, and tend also to experience less lung disease. Or people who eat a lot of peaches also are less likely to smoke, or more likely to exercise, or engage in any number of other healthy behaviors that have nothing to do with eating peaches specifically.
Now, let’s go back to those epidemiological studies. It turns out, in those studies, once they control for sex, smoking habits, exercise, etc., peach quantity no longer predicts lung health. This is an important thing to know! We want to know if peaches are related to lung health above and beyond things we already know promote lung health and the answer here would be “no.” Yet, a meta-analysis of such studies relying on bivariate correlations would falsely declare the answer is “yes.”
As such, meta-analyses often promote a false confidence in many social science theories. So do some individual studies with large sample sizes when scholars focus on “statistical significance” and ignore the possibility that a.) tiny effects may simply be crud and b.) even if they are not crud, they may be of no particular practical value because they are just too tiny to be noticed by real people in real life.
Unfortunately, scholars have concocted a number of self-serving rationales why “small is big” (e.g., tiny effect sizes when sprinkled over large populations like pixie dust magically transform into important findings). It’s time that we rejected those.
Thus, I have two basic recommendations for any social science scholars reading this.
First, no effect size below r = .10 should be used to support a hypothesis even if “statistically significant.” The probability of such findings being mere noise is simply too high. Effects from .10 to .20 should be interpreted only with caution and r = .20 is probably a reasonable threshold for practical significance (i.e., something people in the real world should at least minimally care about).
Second, stop using bivariate correlations in most meta-analyses. This has been a bad and misleading practice. It really should stop.
My recommendation to the general public is, sadly, increased skepticism of findings in social science remains warranted.
Unfortunately, even as some parts of psychology have wrestled with the replication crisis, there’s little serious effort to take honest stock of how overcommunication of trivial, often noise effects is misinforming the public and doing real harm to our credibility. Until we do, we will continue to be a “soft” science at best.