Control Variables are Good!
A study suggesting black babies benefit from black doctors may have failed to include proper controls.
Back in the height of American’s 2020 panic over race, it seemed like bad news was coming from everywhere. Perhaps the US was a much more racist place than we had imagined! In the 4 years since, much of that has been rolled back…not that the modern US is an absolute utopia, but things aren’t actually all that bad and even where group disparities exist, they may be due to other factors aside from widespread systemic racism.
One study that got a lot of attention at the time was an analysis of birth mortality by Greenwood and colleagues. They observed a racial disparity in infant mortality between black and white babies, such that black babies die at higher rates than do white babies. More critically, they observed that this morality penalty was substantially reduced when black babies had black doctors as opposed to white doctors. In other words, there might be a benefit to black babies being matched to black doctors. Put more pessimistically, perhaps white doctors are more callous with black babies than white. This study was correlational, but naturally that stopped people from making causal attributions for about a nanosecond.
The original study got widespread news media attention. Justice Ketanji Brown Jackson referred to it in Students for Fair Admissions v. Harvard; the US Supreme Court Affirmative Action decision.
Only it turns out the story is more complicated. A recent reanalysis of this data by Borgas and VerBruggen, in the same journal, came to the opposing conclusion: that in fact, race of the doctor had no impact on black babies’ mortality. How did they do this? They simply added a control variable to the analysis, in this case babies’ birthweight. It turns out that white doctors are more likely to care for, on average, lower birthweight black babies. Lower birthweight babies have higher morality. Thus, the true issue for babies is lower birthweight, not racial concordance with their doctors. A third variable appears to have explained the relationship between racial concordance between baby and doctor and mortality. It had nothing to do with race at all. The racial concordance correlation seems to have been, in effect, a false positive result.
This is the common Third Variable Problem. Very often third variables (otherwise called covariates or control variables) explain away an observed correlation. In my own major field of violent video games, this was quite common. For instance, boys both play more violent video games than girls, and also are more physically aggressive. Thus, it’s important to control for biological sex when examining links between gameplay and aggression. There are other theoretically relevant control variables as well: mental health, prior aggressive personality, peer relations, family relations, genetics if you can manage it. It turns out when you control for these other things, correlations between violent game play and aggression are pretty much zero.
Unfortunately, too many of the correlations we talk about in public discourse are based on bivariate correlations…that is to say, without control variables. In my own field, I regularly come across arguments against using theoretically appropriate control variables. These are unfortunate. These arguments only serve the purpose of continuing to misinform the public and appear defensive regarding when scholars do not want to admit their theories are weaker than they had hoped. Of course this issue is not unique to video game violence, as I’ve heard similar comments eschewing the value of proper third variables in other fields as well.
To be fair, it is possible to use bad covariates. A study could pretend to control for other issues, but not really, simply by including irrelevant controls. For instance, if I were to correlate video game habits with aggression, and control for the number of fish in the pond nearest each individual participants’ house. Number of fish is theoretically irrelevant. Such a study gives the illusions of being a controlled study, but actually still is a dreaded bivariate correlation. We can think of this as an undersaturated model.
By contrast, some control variables truly are bad because they are, in effect, the exact same thing as the main predictor we’re interested in. For instance, let’s say we want to see if race is associated with success in a particular field of employment. Although real but subtle population genetics differences exist, race as we define it is pretty crude and, in the main, based on skin color and other superficial traits. So, if we were to include skin color as a control variable, this would be silly as skin color is a major component to how we define race. If we then found that race didn’t predict employment, this would be a false negative as our control variable was bad. We can think of this as an oversaturated model.
As such, it is important that control variables are theoretically relevant and based on sound prior science. But we certainly should be using them (including in meta-analyses) and communicating them honestly to the public. Very likely a significant proportion of public misinformation based on actual social and medical science is due to the failure to both use and communicate appropriate third variables.