When you learn new things, you can develop misconceptions. Maybe it’s the result of something you didn’t understand correctly. Maybe it’s the way the instructor explains something. Or maybe, it’s something unspoken, something you assume or infer from what was said. Here are six misconceptions about statistics you might have gotten from Stats 101.
Misconception 1: “Statistics is Math“
How could you not come to believe this? Even before you took Stats 101, you learned you had to take the course to fulfill a math requirement. It was taught by the Math Department. Then when you took the course, it was all numbers. Homework and exams were almost all about calculations. Stat 101 was all math. Statistics must be all math too.
Statistics uses numbers but numbers are not the primary focus of statistics, at least to most practitioners. Applied statistics is a form of inductive reasoning that uses math as one of its tools. It also uses sorting for ranks, filtering for classification, and all kinds of graphics. The point of using statistics is to discover new knowledge and solve problems through the use of inductive reasoning involving numbers. It’s not just about doing calculations. That’s why it’s required for college majors in business, social sciences, and many other disciplines. That’s why it’s taught by professors in all those disciplines, too. Yes, it’s required for math degrees and is taught by math professors at many schools. That’s so there will be mathematical statisticians who will invent statistical tools for the applied statisticians to use. You can love statistics and be good at statistical thinking even if you think you hate math.
Misconception 2: “Statistics Requires a Lot of Data“
Stats 101 doesn’t teach you how to work with individual pieces of information, like a solitary measurement, or a picture, or eyewitness testimony. Statistics uses data, lots of data, the more data the better. The number of samples is a term in almost every equation. And anyway, that’s what the law of large numbers says, the more data the better the results.
The number of samples you really need for a statistical analysis is contingent on how much resolution you want. Think of the resolving power of a telescope or a microscope, or the number of pixels in a computer image. The greater the resolution, the more detail you’ll see. It’s the same way with statistics (http://statswithcats.wordpress.com/2010/07/17/purrfect-resolution/).
What’s more important than the number of data points is the quality of the data points. In statistics, the quality of a set of data point is how well the data points represent the population from which they are drawn. But representative data can be incredibly difficult to generate. How do you decide which registered voters are actually likely to vote in the next election? How do you decide who might use a product you might want to sell?
The number of samples is easy to determine. The quality of the samples is virtually impossible to determine. Nevertheless, what you should remember is that more data may be better but better data are always best.
Misconception 3: “Data are Dependable“
In Stats 101, you do a lot of number crunching. You use small datasets and big datasets, real data and fake data, but never were you told to delete data. You figured that data are like facts. You don’t delete them for any reason or you will bias your results.
Data are messy. Most newly generated datasets have errors, missing observations, and unrepresentative samples. Some population properties may be under-represented or over-represented. There may be samples that should not be included in the analysis, like replicates, QA samples, and metadata. All these problems with data require a lot of processing before an analysis can begin (http://statswithcats.wordpress.com/2010/10/17/the-data-scrub-3/). In fact, data scrubbing often consumes the majority of a project budget and schedule, but you have to do it anyway.
Misconception 4: “Statistics Provides Unique Solutions“
In all the problems your Stats 101 instructor solved in class, and all the homework assignments you did, and all the exams you took, there was only one “right answer” to a question. So, any statistical analysis should provide the same results no matter who does it.
Even if two statisticians start with identical data sets, they may not come to identical results, and sometimes, even identical conclusions. This is because they may make different assumptions and scrub the data differently. Furthermore, there may be more than one way, even many ways, to approach a problem (http://statswithcats.wordpress.com/2010/08/22/the-five-pursuits-you-meet-in-statistics/). There may also be different statistical analysis techniques that can be used, or even different options within the same technique (http://statswithcats.wordpress.com/2010/08/27/the-right-tool-for-the-job/). It would probably be more surprising for two statisticians to calculate the same results from a dataset than for them to have some differences. Just like most problems in the real world, there may have more than one right answer from a statistical analysis.
Misconception 5: “Statistics Provides Unambiguous Results“
Results are either significant or they’re not. That’s pretty unambiguous.
Statistical results are based on data and assumptions about the data. Change the number of samples and you change the resolution of the statistical procedure. Change the data or the assumptions and you change the estimates of variability. Change the resolution or the estimates of variability and you have different results. There is indeed uncertainty in uncertainty. Sometimes uncertainty brings with it ambiguity.
Is there really a difference between Type I error rates of 0.049 and 0.051? Many decision makers who never got past Stats 101 think so. But interpretations of these results are based on the assumptions and biases a statistician brings with him. One statistician might take a firm stance and say “significant” and another might say, “maybe not.” Results have uncertainty; interpretations have ambiguity, and decisions have risks. That’s statistics.
Misconception 6: “It’s Easy to Lie with Statistics“
Darrell Huff wrote “How to Lie with Statistics” in 1954 (http://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/0393310728/ref=pd_sim_b_2). Michael Wheeler wrote “Lies, Damn Lies, and Statistics: The Manipulation of Public Opinion in America” in 1976 (http://www.amazon.com/Lies-Damn-Statistics-Manipulation-Opinion/dp/0393331490/ref=sr_1_17?ie=UTF8&qid=1298231730&sr=8-17). John Allen Paulos wrote “Innumeracy: Mathematical Illiteracy and Its Consequences” in 1988 (http://www.amazon.com/Innumeracy-Mathematical-Illiteracy-Its-Consequences/dp/0809058405/ref=ntt_at_ep_dpi_1). Joel Best wrote “Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists” in 2001 (http://www.amazon.com/Damned-Lies-Statistics-Untangling-Politicians/dp/0520219783/ref=sr_1_3?ie=UTF8&qid=1298231253&sr=8-3).
So it must be pretty easy to lie with statistics since everybody is doing it.
It’s hard to do statistics right but it’s also a lot of work to do them wrong, too. You have to collect data, crunch the numbers, and cook up your story, or perhaps more correctly, cook up your story, make up the data, and call the press conference. But if you’re going to mislead an audience, it’s much easier to use made up facts, phony anecdotes, and illogical conjectures. So why do so many people, particularly politicians, even bother lying with statistics? It’s because numbers provide credibility. If you have little credibility yourself, using numbers can confer the illusion of expertise. And that is why people use statistics in the first place.
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.com, barnesandnoble.com, or other online booksellers.