Even if you’re not a statistician, you may one day find yourself in the position of reviewing a statistical analysis that was done by someone else. It may be an associate, someone who works for you, or even a competitor. Don’t panic. Critiquing someone else’s work has got to be one of the easiest jobs in the world. After all, your boss does it all the time (http://statswithcats.wordpress.com/2010/11/14/you-can-lead-a-boss-to-data-but-you-can%e2%80%99t-make-him-think/). Doing it in a constructive manner is another story.
Don’t expect to find a major flaw in a multivariate analysis of variance, or a neural network, or a factor analysis. Look for the simple and fundamental errors of logic and performance. It’s probably what you are best suited for and will be most useful to the report writer who can no longer see the statistical forest through the numerical trees.
So here’s the deal. I’ll give you some bulletproof leads on what criticisms to level on that statistical report you’re reading. In exchange, you must promise tobe gracious, forgiving, understanding, and, above all, constructive in your remarks. If you don’t, you will be forever cursed to receive the same manner of comments that you dish out.
With that said, here are some things to look for.
The Red-Face Test
Start with an overall look at the calculations and findings. Not infrequently, there is a glaring error that is invisible to all the poor folks who have been living with the analysis 24/7 for the last several months. The error is usually simple, obvious once detected, very embarrassing, and enough to send them back to their computers. Look for:
- Wrong number of samples. Either samples were unintentionally omitted or replicates were included when they shouldn’t have been.
- Unreasonable means. Calculated means look too high or low, sometimes by a lot. The cause may be a mistaken data entry, an incorrect calculation, or an untreated outlier.
- Nonsensical conclusions. A stated conclusion seems counterintuitive or unlikely given known conditions. This may be caused by a lost sign on a correlation or regression coefficient, a misinterpreted test probability, or an inappropriate statistical design or analysis.
Nobody Expects the Sample Inquisition
Start with the samples. If you can cast doubt on the representativeness of the samples, everything else done after that doesn’t matter. If you are reviewing a product from a mathematically trained statistician, probably the only place to look for difficulties is in the samples. There are a few reasons for this. First, a statistician may not be familiar with some of the technical complexities of sampling the medium or population being investigated. Second, he or she may have been handed the dataset with little or no explanation of the methods used to generate the data. Third, he or she will probably get everything else right. Focus on what the data analyst knows the least about.
Data Alone Do Not an Analysis Make
Unless you see the report writer counting on his or her fingers, don’t worry about the calculations being correct. There’s so much good statistical software available that getting the calculations right shouldn’t be a problem (http://statswithcats.wordpress.com/2010/06/27/weapons-of-math-production/). It should be sufficient to simply verify that he or she used tested statistical software. Likewise, don’t bother asking for the data unless you plan to redo the analysis. You won’t be able to get much out of a quick look at a database, especially if it is large. Even if you redo the analysis, you may not make the same decisions about outliers and other data issues that will lead to slightly different results (http://statswithcats.wordpress.com/2010/10/17/the-data-scrub-3/). Waste your time on other things.
Descriptive statistics are usually the first place you might notice something amiss in a dataset. Be sure the report provides means, variances, minimums, and maximums, and numbers of samples. Anything else is gravy. Look for obvious data problems like a minimum that’s way too low or a maximum that’s way too high. Be sure the sample sizes are correct. Watch out for the analysis that claims to have a large number of samples but also a large number of grouping factors. The total number of samples might be sufficient, but the number in each group may be too small to be analyzed reliably.
You might be provided a matrix with dozens of correlation coefficients (http://statswithcats.wordpress.com/2010/11/28/secrets-of-good-correlations/). For any correlation that is important to the analysis in the report, be sure you get a t-test to determine whether the correlation coefficient is different from zero, and a plot of the two correlated variables to verify that the relationship between the two variables is linear and there are no outliers.
Regression models are one of the most popular types of statistical analyses conducted by non-statisticians. Needless to say, there are usually quite a few areas that can be critiqued. Here are probably the most common errors.
- Data—If the ratio of data points to predictor variables isn’t at least 10 to 1, the model will be unstable (http://statswithcats.wordpress.com/2010/07/17/purrfect-resolution/).
- Intercept—There should be an intercept term in the model unless there is a compelling theoretical reason not to include it. When an intercept is omitted, the coefficient of determination is artificially inflated and the model will look better than it really is.
- Variation—Look at the variation of the predictions, usually expressed as the standard error of estimate (http://statswithcats.wordpress.com/2010/12/19/you%e2%80%99re-off-to-be-a-wizard/). You might have an accurate predictive model that lacks enough precision to be useful.
Statistical tests are often done by report writers with no notion of what they mean. Look for some description of the null hypothesis (the assumption the test is trying to disprove) for the test. It doesn’t matter if it is in words or mathematical shorthand. Does it make sense? For example, if the analysis is trying to prove that a pharmaceutical is effective, the null hypothesis should be that the pharmaceutical is not effective. After that, look for the test statistics and probabilities. If you don’t understand what they mean, just be sure they were reported. If you want to take it to the next step, look for violations of statistical assumptions (http://statswithcats.wordpress.com/2010/10/03/assuming-the-worst/).
Analysis of Variance
An ANOVA is like a testosterone-induced, steroid-driven, rampaging horde of statistical tests. There are many many ways the analysis can be misspecified, miscalculated, misinterpreted, and misapplied. You’ll probably never find most kinds of ANOVA flaws unless you’re a professional statistician, so stick with the simple stuff.
A good ANOVA will include the traditional ANOVA summary table, an analysis of deviations from assumptions, and a power analysis. You hardly ever get the last two items. Not getting the ANOVA table in one form or another is cause for suspicion. It might be that there was something in the analysis, or the data analyst didn’t know it should be included.
If the ANOVA design doesn’t have the same number of samples in each cell, the design is termed unbalanced. That’s not a fatal flaw but violations of assumptions are more serious for unbalanced designs.
If the sample sizes are very small, only large difference can be detected in the means of the parameter being investigated. In this case, be suspicious of finding no significant differences when there should be some.
Assumptions Giveth and Assumptions Taketh Away
Statistical models usually make at least four assumptions: the model is linear; the errors (residuals) from the model are independent; Normally-distributed; and have the same variance for all groups. A first-class analysis will include some mention of violations of assumptions. Violating an assumption does not necessarily invalidate a model but may require that some caveats be placed on the results.
The independence assumption is the most critical. This is usually addressed by using some form of randomization to select samples. If you’re dealing with spatial or temporal data, you probably have a problem unless some additional steps were taken to compensate for autocorrelation.
Equality of variances is a bit more tricky. There are tests to evaluate this assumption, but they may not have been cited by the report writer. Here’s a rule of thumb. If the largest variance in an ANOVA group or regression level is twice as big as the smallest variance, you might have a problem. If the difference is a factor of five or more, you definitely have a problem.
The Normality of the residuals may be important although it is sometimes afforded too much attention. The most serious problems are associated with sample distributions that are truncated on one side. If the analysis used a one-sided statistical test on the same side as the truncated end of the distribution, you have a problem. Distributions that are too peaked or flat can result in slightly higher rates of false negative or false positive tests but it would be hard to tell without a closer look than just a review.
Look at a few scatter plots of correlations with the dependent variable, then forget the linearity assumption. It’s most likely not an issue. If the report goes into nonlinear models, you’re probably in over your head.
We’re Gonna Need a Bigger Report
There are scores of ways that data analysts mislead their readers and themselves with graphs (http://statswithcats.wordpress.com/2010/09/26/it-was-professor-plot-in-the-diagram-with-a-graph/). Here’s the first hint. If most of the results appear as pie charts or bar graphs, you’re probably dealing with a statistical novice. These charts are simple and used commonly, but they are notorious for distorting reality. Also, be sure to check the scales of the axes to be sure they’re reasonable for displaying the data across the graphic. If comparisons are being made between graphics, the scales of the graphics should be the same. Make sure everything is labeled appropriately.
As with graphs, there are so many things that can make a map invalid that critiquing them is almost no challenge at all. Start by making sure the basics—north arrow, coordinates, scale, contours, and legend—are correct and appropriate for the information being depicted. Compare extreme data points with their depiction. Most interpolation algorithms smooth the data, so the contours won’t necessarily honor individual points. But if the contour and a nearby datum are too different, some correction may be needed. Check the actual locations of data points to ensure that contours don’t extend (too far) into areas with no samples. Be sure the northing and easting scales are identical, easily done if there is an overlay of some physical features. Finally, step back and look for contour artifacts. These generally appear as sharp bends or long parallel lines, but they may take other forms.
It’s always handy in a review to say that all the documentation was not included. But let’s be realistic. Even an average statistical analysis can generate a couple of inches of paper. A good statistician will provide what’s relevant to the final results. If you’re not going to look at it probably no one else will either. Again, waste your time on other things. On the other hand, if you really need some information that was omitted, you can’t be faulted for making the comment.
You’ve Got Nothing
If, after reading the report cover-to-cover, you can’t find anything to comment on, you can sit back and relax. Just make sure you haven’t also missed a fatal flaw (http://statswithcats.wordpress.com/2010/11/07/ten-fatal-flaws-in-data-analysis/).
If you’re the suspicious sort, though, there is another thing you can try. This ploy requires some acting skills. Tell the data analyst/report writer that you are concerned that the samples may not fairly represent the population being analyzed.
Expressing concern over the representativeness of a sample is like questioning whether a nuclear power plant is safe. No matter how much you try, there is no absolute certainty. Even experienced statisticians will gasp at the implications of a comment concerning the sample not being representative of the population. That one problem could undermine everything they’ve done.
Here’s what to look for in a response. If the statistician explains the measures that were used to ensure representativeness, prevent bias, and minimize extraneous variation, the sample is probably all right. If the statistician mumbles about not being able to tell if the sample is representative and talks only about the numbers and not about the population, there may be a problem. If the statistician ignores the comment or tries to dismiss it with a stream of meaningless generalities and unintelligible jargon (http://statswithcats.wordpress.com/2010/07/03/it%e2%80%99s-all-greek/), there is a problem and the statistician probably knows it. If he or she won’t look you in the eyes, you’ve definitely got something. If you get an open-mouth, big-eye vacant stare, he or she knows less about statistics than you do. Be gentle!
Now It’s Up to You
So that’s my quick-and-dirty guide to critiquing statistical analyses. Sure there’s a lot more to it, but you should be able to find something in these tips that you could apply to almost any statistical report you have to review. At a minimum, you should be able to provide at least some constructive feedback that will benefit both the writer and the report. Maybe you’ll even be able to prevent a catastrophe. If nothing else, you’ll have earned your day’s pay, and if you critique constructively, the respect of the report writer as well.
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.com, barnesandnoble.com, or other online booksellers.