Avoiding statistical fallacies in writing 2


Sorry it’s been so long since my last post. I’ve moved, and my cat Carl died.

I’ve noticed a lot of bad treatment of statistics on the Web, especially in connection with the COVID epidemic and vaccines. Sometimes writers make big errors even when trying to correct other people’s mistakes. It’s a huge and difficult subject, so I’ll approach it by linking to some good articles and commenting briefly on each one. It takes time to digest all the information, so you might want to bookmark some of the pieces you find interesting and come back to them later.

The base rate fallacy is one of the most important errors. It deals with false positives and negatives. When it exaggerates the likelihood that people are criminals or terrorists, it can badly mess up people’s lives. When it exaggerates the risks of a medical treatment, it can scare people into making bad choices. I like Bruce Schneier’s article on how data mining can turn up more mistakenly identified terrorists than real ones.

That fallacy is a special case of the general error of misapplying or misinterpreting statistics. The broader issue includes letting them seem to mean something they don’t, comparing incommensurable numbers, and using psychological tricks to make data appear alarming or reassuring. The article “Understanding Uncertainty: The Many Ways of Spinning Risk” covers some of the ways that the presentation of accurate statistics can mislead.

The gambler’s fallacy or Monte Carlo fallacy is the notion that if a random event has had a run in one direction, it’s more likely than usual to compensate in subsequent runs. For example, if a roulette wheel comes up on black several times in a row, the fallacy says that the odds are greater than even that it will come up red next time to make up for it. But unless the event has some “memory” of the previous events, this doesn’t happen. The odds of both colors on an honest wheel are slightly less than 1 in 2 (since 0 has no color), regardless of how it comes up in the past. Here’s a good piece: “Everything about the Gambler’s Fallacy with Examples”. Regression to the mean is real, but it doesn’t alter the probability of any single event.

Cherry picking can produce misleading statistical results. By picking your start and end dates and locations to fit your conclusions, you can give the impression that global warming isn’t happening or that it’s much worse than it really is. It’s a great tool for advertising and sensationalism. The people who use it consciously are out to mislead, but it’s easy to be fooled by someone else’s cherry-picked numbers. Here’s an academic article on the subject, with a focus on how it produces bad science.

The probability of events occurring together often throws people. If two events are independent of each other — let’s say getting an ace when drawing a card and getting heads when flipping a coin — then the probability of both happening is the separate probabilities multiplied together. If the separate probabilities are low, the joint probability is much lower. In this case, the chances of getting an ace and heads in one try of each would be 1/2 * 1/13, or 1/26. An article which I read recently said that “two cases of thrombosis have resulted from administration of the Moderna vaccine out of 328 million doses worldwide, which would be roughly the probability of getting struck by lightning 300 times in one year.” The National Weather Service estimates the odds of being struck by lightning in a year at 1 in 222,000. The odds of being struck 300 times, assuming independent events, is that number raised to the 300th power, which is less than the odds that the ceiling will cave in and kill you when you get your shot. (I’m assuming you haven’t gotten some deity really, really mad at you.)

“Common Statistical Fallacies and Paradoxes” on RealClear Science covers a grab bag of errors: Simpson’s paradox, the base rate fallacy, the Will Rogers paradox, Berkson’s paradox, and the multiple comparisons fallacy. It’s not easy going. I had to think hard about the discussion of Simpson’s paradox, which says that sometimes the overall tendency for a group can be the opposite of all the subgroups it’s divided into.

Finally, there are plain fake and made-up statistics. Dig into the source, especially if you get your data from Facebook or Twitter.

Watch out for these fallacies in your research sources, and try not to commit them in your own writing.

Just by the way, I’ve discontinued my business relationship with Verblio. I don’t like to air dirty laundry concerning dealings with past customers, so I’ll leave it at that.


2 thoughts on “Avoiding statistical fallacies in writing

  • Phil Mills

    I’m very sorry to hear that you lost Carl. I got the impression that he was an important friend.

    • Gary McGath Post author

      Thanks. He’d been doing poorly for a while, losing the appetite that had earned him the title of “world’s hungriest cat” on the MRFRS calendar. The vet couldn’t find anything. When he didn’t eat anything for a whole weekend and spent almost all his time sitting on one cushion, it was obvious the end was near. The vet found a tumor that must have been growing rapidly, since it went unnoticed a couple of months ago. There would have been no use in prolonging it for him, especially with the added stress of a weekend in cat boarding and then a new house. Mokka is still doing fine.

Comments are closed.