Of the over two million college degrees that are granted in the U.S. every year, including those earned at accredited online colleges nationwide, probably two-thirds require completion of a statistics class. That’s over a million and a half students taking Statistics 101, even more when you consider that some don’t complete the course.
Everybody who has completed high school has learned some statistics. There are good reasons for that. Your class grades were averages of scores you received for tests and other efforts. Most of your classes were graded on a curve, requiring the concepts of the Normal distribution, standard deviations, and confidence limits. Your scores on standardized tests, like the SAT, were presented in percentiles. You learned about pie and bar charts, scatter plots, and maybe other ways to display data. You might even have learned about equations for lines and some elementary curves. So by the time you got to prom, you were exposed to at least enough statistics to read USA Today.
Faced with taking Statistics 101, you may be filled with excitement, ambivalence, trepidation, or just plain terror. Your instructor may intensify those feelings with his or her teaching style and class requirements. So to make things just a bit easier, here are a few concepts to remember.
Everything is Uncertain
The fundamental difference between statistics and most other types of data analysis is that in statistics, everything is uncertain. Input data have variabilities associated with them. If they don’t, they are of no interest. As a consequence, results are always expressed in terms of probabilities.
Every data measurement is variable, consisting of:
- Characteristic of Population—This is the part of a data value that you would measure if there were no variability. It’s the portion of a data value that is the same between a sample and the population the sample if from.
- Natural Variability—This part of a data value is the uncertainty or variability in population patterns. It’s the inherent differences between a sample and the population. In a completely deterministic world, there would be no natural variability.
- Sampling Variability—This is the difference between a sample and the population that is attributable to how uncharacteristic (non-representative) the sample is of the population.
- Measurement Variability—This is the difference between a sample and the population that is attributable to how data were measured or otherwise generated.
- Environmental Variability— This is the difference between a sample and the population that is attributable to extraneous factors.
The goal of most statistical procedures is to estimate the characteristic of the population, characterize the natural variability, and control and minimize the sampling, measurement, and environmental variability. Minimizing variance can be difficult because there are so many causes and because the causes are often impossible to anticipate or control. So if you’re going to conduct a statistical analysis, you’ll need to understand the three fundamentals of variance control—Reference, Replication, and Randomization.
Statistics ♥ Models
Statistics and models are closely intertwined. Models serve as both inputs and outputs of statistical analyses. Statistical analyses begin and end with models.
Statistics uses distribution models (equations) to describe what a data frequency would look like if it were a perfect representation of the population. If data follow a particular distribution model, like the Normal distribution, the model can be used as a template for the data to represent data frequencies and error rates. This is the basis of parametric statistics; you evaluate your data as if they came from a population described by the model.
Statistical techniques are also used to build models from data. Statistical analyses estimate the mathematical coefficients (parameters) for the terms (variables) in the model, and include an error term to incorporate the effects of variation. The resulting statistical model, then, provides an estimate of the measure being modeled along with the probability that the model might have occurred by chance, based on the distribution model.
Measurement Scales shape Analyses
You may not hear very much about measurement scales in Statistics 101, but you should at least be aware of the difference between nominal scales, ordinal scales, and continuous scales. Nominal scales, also called grouping or categorical scales, are like stepping stones; each value of the scale is different from other values, but neither higher nor lower. Discrete scales are like steps; each value of the scale has a distinct break from the next discrete value, which is either higher or lower. Continuous scales are like ramps; each value of the scale is just a little bit higher or lower than the next value. There are many more types of scales, especially for time scales, but that’s enough for Statistics 101.
The reason measurement scales are important is that they will help guide which graph or statistical procedure is most appropriate for an analysis. In some situations, you can’t even conduct a particular statistical procedure if the data scales are not appropriate.
Everything Starts with a Matrix
You may not realize it in Statistics 101, but all statistical procedures involve a matrix. Matrices are convenient ways to assemble data so that computers can perform mathematical calculations. If you go beyond Statistics 101, you’ll learn a lot about matrix algebra. But for Statistics 101, all you have to know is that a matrix is very much like a spreadsheet. In a spreadsheet you have rows and columns that define rectangular areas, called cells. In statistics, the rows of the spreadsheet represent individual samples, cases, records, observations, entities that you’re making measurements on, sample collection points, survey respondents, organisms, or any other point or object on which information is collected. The columns represent variables, the measurements or the conditions or the types of information you’re recording. The columns can correspond to instrument readings, survey responses, biological parameters, meteorological data, economic or business measures, or any other types of information. You usually have several sets of variables for a given set of samples. Together, the rows and the columns of the spreadsheet define the cells, which is where the data are stored. Samples (rows), variables (columns), and data (cells) are the matrix that goes into a statistical analysis. If you understand data matrices, you’ll be able to conduct statistical analyses even without your Statistics 101 instructor to help you.
Statistics is More than Description and Testing
In Statistics 101, you learn about probability, distribution models, populations, and samples. Eventually, this knowledge will enable you to be able to describe the statistical properties of a population and to test the population for differences from other populations. But these capabilities, formidable though they are, don’t reveal the truly mind boggling analyses you can do with statistics. You can:
- Describe—characterizing populations and samples using descriptive statistics, statistical intervals, correlation coefficients, and graphics.
- Compare and Test—detecting differences between statistical populations or reference values using simple hypothesis tests, and analysis of variance and covariance.
- Identify and Classify—identifying known or hypothesized entities or classifying groups of entities using descriptive statistics; statistical tests, graphics, and multivariate techniques such as cluster analysis and data mining techniques.
- Predict—predicting measurements using regression and neural networks, forecasting using time-series modeling techniques, and interpolating spatial data.
- Explain—explaining latent aspects of phenomena using regression, cluster analysis, discriminant analysis, factor analysis, and other data mining techniques.
So don’t get discouraged if you can’t see how statistics will help you in your career based on Statistics 101. There’s a lot more out there. You just have to take the first step.
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.com, barnesandnoble.com, or other online booksellers.