Does Tylenol Cause Autism?
Enforcing poor standards or rigor in science has real-world consequences
This week, the Trump administration and Health and Human Services Secretary RFK Jr. specifically, made a surprise announcement linking Tylenol/acetaminophen use in pregnancy to autism in children. I say surprise because there doesn’t appear to be any medical consensus on this issue. Because everything today is culture war, where people fall on this topic seems to correlate highly with a general left/right political axis as one might expect.
I’ll note upfront I don’t have any particular dog in this fight. I’m open to the idea that some cases of autism may be related to environmental or medicine related causes. But I also think the standard of evidence needs to be high. I also think there’s a lot of confusion about autism rates, given the expanded definitions currently used for the disorder. Increasingly, anyone mildly socially awkward can now claim1 to have what was once limited to a critically debilitating disease. I suspect that has falsely led to the narrative there’s a massive epidemic of autism. This is also probably a consequence of the American Psychiatric Association’s critically bad decision to collapse other disorders such as Asperger’s Disorder into one big Autism Spectrum.
Nonetheless, is there evidence that Tylenol or aspirin can cause autism? One of the studies people are pointing to is a systematic review from Harvard. Is it convincing? I’ve had a look.
Overall, I’m not crazy about systematic reviews. They have all the weaknesses of meta-analyses, with few benefits. Basically, there’s just too much subjectivity in these reviews for them to be taken particularly seriously. And that shows in this narrative.
This review is being billed by some as comprehensive proof of Tylenol’s risk. However, I’m not quite so sure. I’ll summarize the review’s findings from the abstract “We identified 46 studies for inclusion in our analysis. Of these, 27 studies reported positive associations (significant links to NDDs), 9 showed null associations (no significant link), and 4 indicated negative associations (protective effects). Higher-quality studies were more likely to show positive associations.” Those numbers don’t add up to 46 and I had a bit of trouble definitively figuring out what happened to the missing 6, but I think they eliminated them as they were low quality.
Basically, that’s a 2-to-1 ratio in favor of the hypothesis. However, with publication bias issues, I’m not sure I find that terribly surprising, and not necessarily a great indication of consistency. The authors try to bolster that by suggesting higher quality studies produce more positive effects. I’m not an expert in this field, so it’s hard for me to gauge whether that is true, but it’s not the sort of thing I’ve learned to take at face value. Given that the corresponding author has ethically reported conflicts of interest on this issue, there may be researcher bias effects. In at least one court case, a judge criticized this author for cherry-picking his analyses2.
However, this systematic review appears to be largely vote counting studies that are “statistically significant”. The authors of the study actually note they didn’t do a meta-analysis because of conflicting between-study results (which is an odd argument against doing a meta-analysis3), yet try to synthesize them narratively to come to a definitive conclusion. Even more odd, they include meta-analyses in the studies they synthesize, which seems atypical to me. I don’t think that’s warranted. But vote-counting studies is a poor way to synthesize research results.
Vote-counting is yet another example of prioritizing p-values over effect sizes. In large sample studies, it’s entirely possible to get a “statistically significant” result, that is simply due to methodological noise or is clinically meaningless. Unfortunately, too many people ignore me when I shake my fist at the sky over this issue and scholars’ general reluctance to interpret effect sizes conservatively and rigorously. But here’s a perfect example of how the “every effect size is sacred” approach can do real harm to public policy.
Overall, I find this systematic review a weak piece of evidence, trying to make weak correlational results that are inconsistent into something causal and definitive. I suspect this study will get a lot of scrutiny because of its use by the Trump administration and that’s fair. But it’s the predictable result of a research field that’s been loosey-goosey with data for a long time.
Often self-diagnosed, though suspect there are a fair number of credulous mental health professionals also using loose criteria. To be fair, I suspect a number of these individuals really do benefit from some level of help, even if what they’re experiencing isn’t really the same as profound autism. Though I suspect there’s also a kind of TikTok grifter autism as well, capitalizing on a trendy diagnosis. There are parallel issues I see, both in the APA’s trend toward broadening criteria for various mental illnesses and, in the case of autism, trying to jam multiple conditions under one broad heading which caused significant confusion.
It’s always difficult to balance out the ad hominem versus reasonable assessment of conflict of interest. It’s tricky too because legal cases often call on expert witnesses, but then that’s viewed as a conflict of interest for the scholar in question. As such, I don’t think I’d say this means we should ignore anything this author says. The judge’s criticisms about cherry-picking may be more crucial than serving as an expert witness per se.
Study heterogeneity does, indeed, mean we should be cautious in interpreting meta-analytic compiled effect sizes as being indicative of population effect sizes. However, that doesn’t mean we can’t conduct an meta-analysis. In fact, meta-analysis can be used to try to assess why studies differ in outcome (which is better than “the average effect size wins!” approach anyway). But to argue that it’s impossible to use meta-analysis to come to a conclusion, but then do exactly that with a systematic narrative review is weird.


