How to be wrong with statistics
The "post-treatment bias" fallacy, or "how my Nobel prize-winning professor justifies his racism"
Assuming you don't live under a rock, you have presumably seen misleading statistics. 4 out of 5 dentists seem to recommend every brand of toothpaste. The economy is doing either great or terribly, whichever fits the writer's narrative. 37% of statistics are just made up on the spot by people who can't think of a third example.
One of the ways my mathematical education has served me best in life is learning to understand particular ways people lie (intentionally or unintentionally) with statistics, which can help us to understand which facts are helpful in understanding the world and which are simply there to mislead.
In this post, I want to talk about something called ``post-treatment bias", because I learned about it over the summer and since then it's popped up OVER AND OVER AGAIN in my life, so it seems like something that's worth knowing about when you see numbers show up in the news! This post is intended to be accessible to anybody, without assuming any background in statistics.
We'll quickly review what people say when they mean a statistic is "controlled for" something else, and then explain the error.
"Controlling for things"
Suppose you were interested in the health benefits of prune juice. One (extremely crude) way to ask this question is ``will drinking prune juice keep me from going to the hospital?" You could go out on a particular day and ask a thousand people if they drink prune juice, and come back a year later to see how many of the people you asked had been to the hospital in the past year.
You would probably find some pretty stark numbers. Maybe 20% of the prune juice drinkers would have been to the hospital, while only 5% of the non-prune-juice drinkers had. If you were some sort of anti-prune warrior, you might put out youtube videos saying that prune juice increases your chances of hospitalization by FOUR TIMES and everyone should stay far away from the stuff.
What went wrong? There was a confounding variable: something related both to prune juice drinking and hospitals that you didn't take into account. In this case, an obvious candidate is age: nobody under the age of seventy-five enjoys the taste of prune juice, so instead of comparing two otherwise-identical prune juice drinkers, you've compared an eighty-year-old prune juice drinker to a seventeen-year-old who isn't sure which fruit prune juice comes from.
In theory, this problem is not too hard to solve. Instead of just comparing drinkers to non-drinkers, we should "control for age". To do this, we pick an age where we have lots of drinkers and non-drinkers, and restrict ourselves to that age. So, for example, I might be able to tell you that 22% of 80-year-olds who do not drink prune juice had to go to the hospital in a given year, while only 18% of 80-year-olds who drink prune juice did. This would be (extremely weak) evidence for benefits of prune juice!
(In real life I would presumably ask for more information. Instead of just telling me the number for 80-year-olds, I would ask you tell me the number for every age there were enough prune juice drinkers and non-drinkers to make the comparison! Maybe I might worry that there are other things than "age" that differentiate the drinkers from non-drinkers, and I'd ask you to control for those, or I'd at least be a little skeptical unless you convinced me you had controlled for literally everything that mattered)
Post-Treatment Bias, or "How to be Wrong With Statistics"
Controlling for things is a powerful idea, and so it's not surprising that it pops up all over the place. Sometimes, however, things can go a little bit awry.
Example (these numbers are made up): I would like to know if preschool is a good investment, so I pay for a random group of children to go to preschool and, thirty years later, compare their income to a group of similar children who did not go to preschool. Recognizing that people with college degrees tend to have higher incomes, I decide to restrict my analysis to children who attended college. I find that the children who attended preschool make approximately the same amount of money as the children who didn't, and conclude that preschool is a waste of time.
What went wrong here? One of the effects of preschool is that preschool makes a child more likely to attend college. Even if preschool doesn't change the amount of money a college graduate makes, or the amount of money a non-college-graduate makes, there is still a chance that preschool really does increase life income by moving people from the non-college group to the college group.
(If you enjoy thinking about numbers and are having trouble internalizing this idea, suppose I observed 3 preschool students and 3 non-preschool students. Pretend that everyone who went to college makes 100k/year, and everyone who didn't make 50k/year. If 2 preschool students went to college and only 1 non-preschool student did, then preschool legitimately did help someone make an extra 50k/year, but you will not see this fact if you don't look at the groups together!)
The statistical problem here is called post-treatment bias, and it's easy to avoid by following a simple rule: if you're interested in the effects of something on something else, you aren't allowed to control for anything that takes place after the first thing.
(We'll leave it as a rule, but if you'd like a technical explanation of why you can't do that, you can read one here.)
Here are three examples of this mistake I have seen in the past week, with explanations of the errors in parantheses:
Sometimes people cite direct evidence that GRE scores don't predict success in grad school as a reason not to use the GRE (Trying to test this directly implicitly controls for ``being accepted into a given grad school", which comes after the GRE scores and therefore can't be controlled for.)*
My econ professor keeps pointing out that there is little evidence for racial discrimination in employment if you control for education. (A person's access to education is decided after their race is assigned at birth, so you can't control for it.)**
I keep seeing attempts to argue that police use-of-force is not racially biased by determining how many police stops end in violence for each race. (Police stops generally happen after the officer or a community member observes a person's race, so you can't control for them.)
*In particular, somebody with low GRE scores but was nonetheless admitted is presumably unusually great in other ways to have convinced the committee to accept them! So maybe they'll be more successful in grad school than the average applicant with their GRE scores. (Note that the GRE is a terrible exam and probably doesn't predict success in grad school. The point is that this particularly way of arguing this point is bad, as opposed to the other two points which have strong evidence against the claims.)
**Sometimes people argue that this is okay -- we're not interested in the total effect of racism, just the question of whether employers are discriminating. This is a misunderstanding of the statistical issues involved. The error is not that educational differences create disparate impacts, although that is true -- the error is that because education is affected by race, "whether there is discrimination conditional on education" does not actually tell you the amount even of pure, direct discrimination by the employer unless you can somehow control for literally everything the employer knows about the employee. If you are skeptical, I can send you a more technical explanation!