Six lies you can tell with statistics
A talk my evil twin gave on statistical fallacies and how to use them to your advantage
I gave a talk at a grad student seminar last week titled “How to Lie to Your Friends and Family” illustrating a common statistical fallacy (“collider bias”) by posing as my evil twin who wanted to use it to deceive people.
My evil twin presented six1 lies, showing how each could be supported by plausible-but-misleading numbers:
(Warmup) Eating caviar makes you live longer
Preschool and child labor are equally good for young kids
Smoking makes you more likely to survive a collapsed lung
American companies do not discriminate against mothers
COVID vaccines are dangerous
Social Media App TikTok prevents lung cancer
(Note: these are illustrations of statistical fallacies, and I have not actually run the numbers on most of them. So, for example, I don’t know whether conditioning on college attendance actually makes the preschool → adult earnings connection disappear. But it’s plausible that it could, and I think it’s a good illustration of why this comparison is a bad one to make!)
(Warmup) Eating caviar makes you live longer
You’re home alone one day when a door-to-door caviar salesman (you know the type!) stops by to sell you some fish eggs.
“Did you know that people who eat caviar tend to live about ten years longer than people who don’t?” he asks, handing you a flow chart:
As you know, caviar salesman follow a strict code of ethics: they will say anything they can to make the sale, but they will never explicitly lie to you. It is true that caviar eaters live much longer than caviar non-eaters.
But as you have probably guessed, this disparity has nothing to do with caviar. You pull out a sharpie and make a few edits:
Caviar eaters didn’t live longer because they were eating caviar. They lived longer because the average caviar eater is much richer than the average non-caviar eater.
To really see what whether caviar had any effect, you’d need to control for wealth — that is, you would compare caviar eaters and caviar non-eaters who made the same amount of money.
(I suspect that a large fraction of studies that take the form “drinking one glass of red wine per day reduces anxiety” or “people who own oil paintings tend to be happier” are secretly making this kind of error.)
This problem is called omitted variable bias, and it’s important. But this talk is actually about the opposite: how things can go terribly wrong if you control for too many factors.
Preschool and child labor are equally good for young kids
You ask the caviar salesman if you can get in on his “misleading people for profit” scam, and he indignantly explains that it isn’t a scam, and it definitely isn’t a pyramid scheme.
Instead you find work with a company that specializes in child labor, which has been struggling to attract employees since people in your town started sending their kids to preschool.
“All these parents keep telling me they think preschool will put their kids on track to be successful. I need you to prove this isn’t true”.
This is a challenge, because most research shows that preschool does set kids up for success. But you press on, undeterred by the facts, and soon enough you’ve written a nice little memo containing the phrase:
“Some studies claim that preschool attendees earn more as adults than non-preschool attendees. But these studies are biased: once you control for college attendance the effects disappear completely! In other words, it’s not preschool that’s helping children succeed, it’s college!”
Months later, you receive an angry letter in the mail explaining your error. Attending preschool, the letter points out, makes children more likely to attend college. So by restricting your sample to students who attended college, you’re ignoring one of the main routes by which preschool helps kids succeed!
You throw the letter away without reading it.
Smoking makes you more likely to survive a collapsed lung
Your work dismantling the preschool system has caught the eye of a new client: a cigarette company we’ll call Big Tobacco.
Big Tobacco is a fitting employer, since a disturbing amount of modern statistical methodology was invented by people trying (unsuccessfully) to prove that smoking isn’t dangerous. But that’s not enough: your employer wants you to show that cigarettes are actively beneficial to the people who smoke them.
You try some simple comparisons, but unsurprisingly you find that cigarettes are harmful.
So you start to make more and more comparisons until you notice something weird: a smaller fraction of smokers with collapsed lungs are dying than non-smokers.
How can this be?
Most collapsed lungs are “spontaneous”, and are relatively safe with proper medical treatment. (I have had two, and the second time they told me to just go home and rest for a few days!)
But some collapsed lungs are extremely dangerous — namely, those caused by some sort of external trauma (e.g. being stabbed). If the “additional” collapsed lungs caused by smoking tend to be spontaneous, than then a smaller fraction of smokers with collapsed lungs will die than non-smokers with collapsed lungs. This is true even though smoking makes it more likely that you will get a collapsed lung and then die with it.
Your stats-based ad campaign is successful, and you are now swimming in money.
American companies don’t discriminate against mothers
Unfortunately, your tobacco company has bigger issues. It turns out the CEO has a policy of discriminating against pregnant women and new mothers, and a number of former employees have filed a lawsuit. (“It’s not a sexism thing”, he assures you.)
As company statistician you’re called to testify, and you point out that the average mother at the company makes more than the average employee. You produces pages and pages of tables showing the same pattern holds when you control for age, experience, or qualifications. The judge has no statistics training and therefore accepts your argument and dismisses the lawsuit.
Unfortunately for you, the mothers appeal. And during the appeal, their expert witness picks up your flow chart:
and adds another bubble:
“Your honor, these charts suffer from what we call collider bias or post-treatment bias.”
“In English, please”
“You’re only seeing the wages of mothers who stayed at the company.”
“So?”
“Let me explain. It costs money to be employed as a parent — for example, if you’re a single parent or your partner works, you need to pay for childcare while you’re at work. And if your goal is to make money, you aren’t going to stay at a job that pays you less than you pay for childcare!
So lots of people (particularly women) quit their jobs when they become parents, and the mothers who quit tend to make less money than the mothers who stay, because there’s less motivation to stay at a job that earns you $100/week after childcare costs than a job that earns you $1000/week.
This company discriminates pretty explicitly — it pays mothers about 10% less for the same work than non-mothers. But because childcare costs disproportionately cause low-paid mothers to quit, the average mother who didn’t need to quit makes about 10% more than the average non-mother!”
The company faces steep fines and you are unceremoniously fired. Judge Selya’s opinion refers to you by name as a “meretriciously mendacious miscreant.”
I learned this example from Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable, which cites papers by Gronau and Heckman.
COVID vaccines make you more likely to die of heart problems
You’ve established your bona fides as one of the world’s worst people but are too incompetent to work in a real industry, so you are immediately hired by the state of Florida.
“We’d like you to prove that the COVID vaccine is killing people.”
“Is it?”
“No, it’s one of the most effective lifesaving interventions of all time.”
(You can tell this will be your most challenging job yet.)
You poke around the literature, and notice a potential link:
Unfortunately, it looks like the link is significantly smaller than the “COVID —> Heart Disease” link, so your “proof” is likely to fail even the laziest cost-benefit analysis.
[At this stage of the talk I asked the audience for ideas on how to fake a study, and they came up with some interesting ideas! But none of them were as profoundly incompetent and unethical as what the state of Florida, and Surgeon General Joseph A. Ladapo in particular, actually did.]
You’re going to limit your study to people who didn’t get COVID.
This means you’re controlling away any actual benefits of vaccines (since you’ve simply decided to ignore anybody who died from COVID or COVID-related complications.) So if three people die of vaccine side effects and hundreds of thousands of times that number of people die of COVID (including heart conditions), your analysis will say “see, three people died from the vaccines!”
Then, you’re going to just straight-up lie when you announce your results to the public. You’re going to say
Based on currently available data, patients should be informed of the possible cardiac complications that can arise after receiving a mRNA COVID-19 vaccine.With a high level of global immunity to COVID-19, the benefit of vaccination is likely outweighed by this abnormally high risk of cardiac-related death among men in this age group. [emphasis mine]
which definitely gives the impression that you compared the benefits to the risks, even though you literally threw away all the data about benefits of vaccination and refused to look at it.
(It’s actually way worse than that: the “study” doesn’t even estimate the number of vaccine side effect problems correctly, because it’s fundamentally based on the assumption that people didn’t change their behavior once they were fully vaccinated in 2021. This is an absolutely insane assumption to make and invalidates pretty much every number in the entire study. I talked about this very briefly during the talk, but the actual study design is kind of complicated and I don’t think it’s worth getting into detail here.)
At this point you are literally lying to kill people in pursuit of power, which makes you relatively unpopular at parties. You leave Florida and pray for a new career opportunity. God does not listen to your prayer.
Social Media App TikTok prevents lung cancer
But TikTok does! (TikTok is a social media app that I’m told is popular with the youth.)
TikTok has faced a lot of bad press recently for things like “contributing to a mental health crisis”, “being a haven for Nazis”, and “potentially giving your data to an authoritarian government.” They’re hiring you to improve their image by demonstrating that there are health benefits to using TikTok.
You check if TikTok improves mental health, but it does not. You check if it improve physical health, but it does not. At this point you start conditioning on every variable imaginable: age? weather? eye color?
Finally, you restrict your sample to people who’ve experienced heart disease, and voila! The TikTokers who’ve experienced heart disease have lower rates of lung cancer than the TikTokers who haven’t! You publish your results, make a boatload of money, and move to wherever con artists and grifters retire to.
(It’s Florida.)
But a part of you always wonders what was up with your TikTok paper. Why would TikTok use be linked to lower rates of lung cancer? What did heart disease have to do with anything? What was actually happening behind the scenes?
One night, you wake up in a cold sweat and draw a diagram that clarifies everything:
Depressed people use TikTok. Depressed people also have higher rates of heart disease. Smokers have higher rates of heart disease, and also higher rates of lung cancer.
If you look at a group of people with heart disease, there will be some depressed people, some smokers, and a bunch of other people. Since depression and TikTok use are correlated, the group of TikTok users with heart disease will have a higher fraction of depressed people than the non-TikTok users with heart disease. Therefore there will be a lower fraction of smokers, and therefore less lung cancer!
At last, the puzzle has been resolved, and the only thing keeping you from sleep is the faint memory of a conscience.
This example is inspired by Implications of M Bias in Epidemiologic Studies: A Simulation Study, although I replaced SSRIs with TikTok.
During the actual talk I skipped #2 to save time.