Every day, we are bombarded with numbers. Statistics and data provide us useful information about the world, but as we head toward another presidential election, we can expect to see plenty of talk about poll results, margins of error, and the like. Stats tell us the batting average of our favorite baseball player and the likelihood of getting struck by a meteorite, but misunderstanding of statistics is also rife, and can lead to a manipulation of the facts that may be either deliberate or inadvertent. Here’s a look at how statistics can often deceive us.
10. “Regression to the Moon”
This is the fine art of taking a trend and extrapolating it to absurd lengths, coined in Charles Seife’s 2010 book Proofiness. The term is a play on the “Regression to the Mean” concept straight out of Statistics 101 class. Trace a trend out far enough, and you can find grounds for nearly any wacky claim, such as the Dow Jones Industrial Average hitting 1 million, or distance runners breaking the speed of sound. For example, many major newspapers bought into a 2002 study claiming that “blondes would go extinct” by 2201, a study based on limited evidence.
9. Black Swan Events
If you had only ever seen white swans, would you believe that black swans exist? In every set of statistics, there are those extreme anomalies, events that are rare but have a huge impact. The 9/11 attacks were a prominent example of a black swan-style event; another recent episode was the once-in-a-millennium tsunami and nuclear disaster in 2011 in Japan. These are nearly impossible to predict, but even the extremely rare occurrence does occasionally happen, and may even look inevitable in retrospect. Political observers will then view the event in hindsight and argue that it could have been prevented.
This is the political tradition of redrawing districts in your party’s favor, a scheme perfected by American politicians. The original “gerrymander” was a reptilian-looking district fabricated by Massachusetts Gov. Elbridge Gerry in a successful bid to hold power in 1812. The idea is that if you dilute an opposing majority among districts, a minority can retain power. Although the Voting Rights Act of 1965 made gerrymandering along strictly racial lines illegal, reapportioning districts for political purposes (also known in political circles as “packing” and “cracking”) remains legal.
7. Potemkin Numbers
These are statistics that look convincing, but are often nothing but a façade. Remember those old ads stating, “Cigarette brand X is 90% smoother than Y?” This is a pure Potemkin number, a logical smoke screen of assigning a specific value to something that can’t be measured. The term comes from an 18th century legend in Russia, when Prince Gregory Potemkin supposedly used fake villages in Crimea to fool and impress the empress during her visit.
6. Correlation Versus Causation
Almost every day, we see news headlines that promote this fallacy. Humans are wonderful at recognizing patterns, but this ability can be used against us, often causing us to see connections where none exist. If statistics were to come out that suggested tennis players suffer higher rates of skin cancer, should we conclude that tennis causes skin cancer? Obviously not, but tennis players do spend more time in the sun, theoretically increasing their skin cancer risk. Indeed, “X causes/cures cancer” is a common version of this fallacy we see in the headlines.
5. Cherry-Picking Data
Politicians, prosecutors, and even scientists are often guilty of skewing statistics for their own ends. It’s very tempting to select the data that supports your point of view, especially when careers or funding is at stake. Keep in mind, statisticians often do refer to anomalous data as “outliers,” and existence of such may be the result of method bias or built-in systematic error (see No. 3 below). Insurance companies may “cherry-pick” the selection of low-risk, healthier clients that are likely to present a better profit margin. Another “fruit-based” statistical metaphor arises from comparing apples and oranges, or evaluating two distinct and separate subsets of data as if they are the same.
4. Risk Miscalculation
We fear rare events such as terrorist attacks and plane crashes, but tend to ignore everyday dangers such as stroke or texting while driving which are much more likely to do us in. Here’s another example of risk miscalculation: Most people would be understandably upset if they were told they had a fatal illness that afflicted one person in a billion … but what if the test used in the diagnosis had a false-positive rate of 1 in a million? Though that still seems insignificant, the chances that the test results are wrong are far greater than are the chances the person has the disease. This is also sometimes known as the false-positive paradox or the prosecutor’s fallacy.
While Apple executives found they had a runaway hit with the introduction of the IPod in 2001, users soon began reporting a perceived problem. Occasionally, the player’s Shuffle mode would play the same song twice in a row. And that’s just what you would expect from true randomness, which is actually pretty hard to replicate. Toss a coin long enough and eventually you’ll get 10 heads in a row, at random. Apple eventually built an exception into its random song generator assuring that the next song you hear can be anything … except the previous one.
2. Systematic Error
A systematic error typically happens when a hidden bias works its way into the data. For example, take a company that is doing an instant poll of people on the Internet. The pollsters are only getting the slice of the population that is online at the time. Systematic errors are hard to eliminate entirely. A classic example was the famous “Dewey defeats Truman” headline printed by the Chicago Tribune after the 1948 presidential election. The paper’s faulty prediction was rushed to press based on faulty polling data.
1. Estimation of Error
In 2008, a nasty legal battle erupted between Al Franken and Norm Coleman following the Minnesota election for U.S. senator. Like the higher profile battle between Al Gore and George Bush in 2000, the Coleman/Franken battle was fought over scant hundreds of votes, with both candidates proclaiming “let every vote count!” while legally maneuvering to throw out the votes for their opponent. Political nastiness aside, the method of hand-counting millions of ballots had a margin of error that was much larger than the gap between the two opponents. Have 10 individuals count up 10,000 items, and you may well get 10 different answers. Although ideas have been proposed to combat voting errors, such as increasing the use of electronic voting, error and fraud will never be entirely eliminated from the electoral process.