When people think about statistical analyses, they often think only of mind-numbing number crunching that creates yet more numbers. But that’s like touring a cabinetmaker’s shop and seeing only the sawdust. A talented cabinetmaker can create beauty and function in his products. In the same way, a creative statistician can create enlightenment and utility if he or she has vision and purpose.
Statistical analyses usually aim at achieving one of five objectives:
- Describe — characterizing populations and samples using descriptive statistics, statistical intervals, correlation coefficients, graphics, and maps.
- Identify or Classify — classifying and identifying a known or hypothesized entity or group of entities using descriptive statistics; statistical intervals and tests, graphics, and multivariate techniques such as cluster analysis.
- Compare or Test — detecting differences between statistical populations or reference values using simple hypothesis tests, and analysis of variance and covariance.
- Predict — predicting measurements using regression and neural networks, forecasting using time-series modeling techniques, and interpolating spatial data.
- Explain — Explaining latent aspects of phenomena using regression, cluster analysis, discriminant analysis, factor analysis, and other data mining techniques.
The following table provides some examples of data analysis tools that can be used for addressing the objectives.
Examples of Tools and Uses of Statistical Objectives
|Objectives||Commonly Used Tools||Examples of Applications|
|Describe||Text and Images Graphs Descriptive statistics||Opinion surveys Demographic surveys|
|Compare or Test||Text and Images Graphs Descriptive statistics Statistical tests||Pharmaceutical effectiveness Educational methods|
|Identify or Classify||Visual scans Filters, queries & sorts Graphs Discriminant analysis Association rules Classification trees Data Mining||Biological species Tax return audits Possible criminals or terrorists|
|Predict||Graphs Regression Neural Networks Data Mining||Credit worthiness Student success in College|
|Explain||Regression Analysis of Variance (ANOVA) Other multivariate statistics||Academic research|
There are other classification schemes that describe other statistical pursuits, so don’t feel constrained by these five categories. But this classification of statistical aims is a reasonable place to start. It has three features. First, it’s easy to figure out so non-statisticians can decide in which category their project fits. Second, the major statistical techniques tend to be used primarily in just one of the classifications. And third, the scheme can be thought of as an index of the professional peril a statistician could face in doing the analysis. Here’s why.
Description is relatively straightforward. You can do the calculations on spreadsheet software. All you have to be aware of are measurement scales, distributions, sampling schemes, measures of central tendency and dispersion, and methods for dealing with outliers and missing data.
Identification and classification range from simple visual recognition to the exploration of arcane mathematical dimensions where only bold number crunchers venture. It’s like finding Waldo. At a convention of funeral directors, one look would be all you needed. If he were making American flags in a candy cane forest, you might need some non-visual clues. You can determine a person’s sex by looking at him or her but not from a table of eye and hair color. On the other hand, you couldn’t tell who the best players were on a sports team from their pictures, but you could from their performance statistics. However you do it, identification is the gateway to classification. If you can do one, you can probably do the other.
Comparison is tougher even though there is ample software available for most analyses. You need to know what test to run or ANOVA design to use as well as understand probability, effect size, and violations of assumptions. There’s a much greater chance of something going wrong.
Prediction is next. In addition to all the description and comparison techniques, you’ll need to know how to use a variety of model building and assessments methods and understand the morass of prediction error. It’s easy to make a prediction. It’s hard to make an accurate prediction. It’s damn near impossible to make an accurate prediction that is also precise. Even if you did nothing wrong statistically, it’s easy to produce a poor prediction, and a poor prediction will eventually be noticed. One really good prediction and a psychic is famous; one really bad prediction and a statistician is relegated to selling insurance.
Finally, explanation is the toughest of all objectives. Not only do you need to understand some of the more esoteric statistical methods, like factor analysis and canonical correlation, but you also have to understand the conceptual framework of the systems the data come from. Then, you have to have the talent to apply the knowledge creatively. You can’t explain your statistical model of stream contamination without knowing something about stream hydraulics, hydrogeology, meteorology, and environmental geochemistry. You can’t explain customer satisfaction without knowing something about demographics, marketing, business, and psychology. You’ll also probably have to integrate the information and think of it in ways that have never been thought of before. Explanation can create fundamental wisdom, although most of the time, your results will be humdrum. If you do come up with something truly consequential, though, some people will believe your results are erroneous, coincidental, or faked. Some people will claim that your finding is old news, having discovered it themselves years before, but then post it on Reddit for the karma. Most people, though, will just ignore you.
Creating a finished statistical analysis from raw data requires knowledge, experience, and often a bit of artistry. So when you conduct or review a statistical analysis, don’t let all the numbers obscure the craftsmanship and functionality of the products. And accordingly, don’t neglect to appreciate the talent and the artistry of the numbermaker.
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.com, barnesandnoble.com, or other online booksellers.