Statistics are like power tools. If you know how to use them, they are incredibly valuable and fun to use. They help you do your job better, more thoroughly, and more quickly. But if you are careless, they can cause great damage.
Think of an expert carpenter like Norm Abram on This Old House.Norm has a different tool for every possible job he might need to do in his workshop. Statistical methods are like that. There are many different types of statistical analysis. Some perform a single function, and some perform many. In the same way that there are several different types of saws, there are different statistical methods for doing exactly the same thing. And just as Norm knows when to use his table saw and when to use his band saw, a statistician knows when to use different types of statistical analysis.
If you haven’t been trained in statistics, selecting which technique to use may seem bewildering. You can usually get in the right ballpark, though, if you understand your variables and your objectives. Consider the hierarchy for selecting a statistical analysis method summarized in this flowchart. The flowchart has five major decision points:
- How many variables do you have?
- What is your statistical objective?
- What scales are the variables measured with?
- Is there a distinction between dependent and independent variables?
- Are the samples autocorrelated?
By the time you get to this point in planning your statistical analysis, you should already have determined the answers to the first four questions.
The first decision is a no-brainer. How many variables do you have—one or more than one? It doesn’t get any simpler than that. But say you have many variables, more than you can easily remember. Then it might be advantageous to use cluster analysis to select representative variables or use a data reduction technique to create new, more efficient variables.
The second decision is “what is your statistical objective?” There are five choices—description. Identification or classification, comparison or testing, prediction, or explanation. There’s more information on these objectives in The Five Pursuits You Meet in Statistics. Once again, this should be a fairly easy decision to make.
The third decision is “what scales are the variables measured with?” This decision is a bit tougher because you have to know something about measurement scales. You might be able to get away with distinguishing only between just a few scales, like nominal (i.e., groups or categories), ordinal (i.e., a sequence of integers), and continuous scales. The more you know about the quirks of the scales the better able you will be to avoid problems. The quirks of time scales, for example, are formidable. Read Time Is On My Side and you’ll see what I mean.
The fourth decision is “is there a distinction between dependent and independent variables?” Once again, this is a decision that is a bit more sophisticated because you have to know something about statistical modeling. In particular, you have to understand why one variable might be the focus of your analysis efforts while the others would be used for support. If your objective involves prediction, you have to have a separate dependent and independent variables.
The fifth decision is “are the samples autocorrelated?” There are three ways observations or samples can be autocorrelated—by time, by location, and by sequence. If it’s important that the dependent variable is measured at a particular location or time, your data are probably autocorrelated. The autocorrelation may not be large but it will be present. There are sophisticated ways of detecting spatial and temporal autocorrelation, but this rule-of-thumb will work most of the time. Measurements can also be autocorrelated by sequence, that is, the order they were taken. Say a measurement device is drifting slowly out of calibration. Each subsequent measurement would have an increasing bias independent of the time or location of the measurement. Sequential autocorrelation isn’t necessarily harder to detect, you just have to know to look for it.
As with most generalized flowcharts for decision-making, there are exceptions. Variables based on cyclic scales, like orientations and months of the year, are an example. There are two options for treating these types of scales. You can either transform the variable into a non-repeating linear scale or use specialized techniques. The first option is usually easier but the second option usually provides better results. Also, if you have more than one dependent variable and you want to analyze all the dependent variables simultaneously, you have to use multivariate statistics. Multivariate statistics are a quantum leap more complex than univariate (i.e., one dependent variable) statistics, and are probably best left to experienced statisticians.
So if you have some notion of what statistical techniques you might apply, read more about it on the Internet and go from there. Just remember, describing all the statistical techniques you might use in an analysis would be like trying to describe all the tools used in carpentry. There are some very common tools, such as saws and hammers, as well as very specialized tools, the ones that aren’t likely to be on the shelves at your local Home Depot. Don’t worry about the very specialized tools. You can accomplish quite a lot with these off-the-shelf statistical techniques. The other thing to bear in mind is that method selection guides such as those presented here can help you decide what you could use but not what you should use. You can use a sledge hammer to drive a nail, but you’d probably be better off using a smaller hammer. That’s a matter of experience, or at least, trial and error. Good luck!
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.com, barnesandnoble.com, or other online booksellers.