I’m working all out
Deadline is near
Model’s in doubt
Dooming my career.
Sta-tis-tics will chill my meltdown.
I’m adding new vars
Testing them twice
Trying to find out which ones’ll suffice
Sta-tis-tics will give the lowdown.
I see the best predictors.
I know what steps come next
I clean up my dataset and
Regress my y on my x.
My work is all through
My deadline was met
My client paid up
Now I’m out of debt.
Sta-tis-tics helped thwart my shutdown.
Sing to the tune of “Santa Claus Is Coming to Town”
Make a list. Check it twice. That’s sage advice from an old fat guy with a beard. Here’s what that means if you’re analyzing data.
What a Phenomenal Concept
The first step in assembling a set of variables for your analysis is to identify the concepts or aspects of the phenomenon you want to investigate. By concepts, I mean to include hypotheses and theories as well as ideas, suppositions, beliefs, assertions, and premises, which may be less definitive or accepted. These concepts will come from the relationships known and supposed about the phenomenon. The reasons for doing this are that concepts can be multifaceted and linked to other concepts creating a framework of relationships underlying the phenomena. In traditional research, this is what a literature search is for. Literature searches, though, are considered by some to be an academic activity not applicable to analyses done on the job. Not true. The process of thinking through what you want to measure is necessary.
Once you have specific ideas you want to explore, identify ways they could be measured. Start with conventional measures, the ones everyone would recognize and know what you did to determine. Then consider whether there are any other ways to measure the concept directly. From there, establish whether there are any indirect measures or surrogates that could be used in lieu of a direct measurement. Finally, if there are no other options, explore whether it would be feasible to develop a new measure based on theory. Keep in mind that developing a new measure or a new scale of measurement is more difficult for the experimenter and less understandable for reviewers than using an established measure.
On a Scale of ½ to VIII
Of the possible measures you identify, select scales of measurement and consider how difficult it would be to generate the data. For example:
- Qualities are usually more difficult to measure accurately and consistently than quantities because there is more complex judgments involved.
- Counts are straightforward when they involve simple judgments as to what to count. Some judgments, such as species counts, can be relatively complex because you have to be able to identify the species before you can count it. Counts have no decimals and no negative numbers.
- Amounts are usually more difficult to measure than counts because the judgment process is more complex. Amounts have decimals but no negative numbers unless losses are admissible.
- Ratio measures, such as concentrations, rates, and percentages, are usually more difficult to measure than amounts because they involve two or more amounts. Ratio measures have both decimals and negative numbers.
Once you know what you might measure, evaluate the sources of measurement variability (benchmark, process, and judgment described in https://statswithcats.wordpress.com/2010/09/12/the-measure-of-a-measure/) in each measure.
Finally, take into account your objective and the ultimate use of your statistics (https://statswithcats.wordpress.com/2010/08/22/the-five-pursuits-you-meet-in-statistics/). For example, if you want to predict some dependent variable, quantitative independent variables would usually be preferable to qualitative variables because they would provide more scale resolution. Furthermore, you could dumb down a quantitative variable you measured to a less finely divided scale or even a qualitative scale. You usually can’t go in the other direction. If you want your prediction model to be simple and inexpensive to use, don’t select predictors that are expensive and time-consuming to measure.
Consider building some redundancy into your variables if there is more than one way to measure a concept. Sometimes one variable will display a higher correlation with your model’s dependant variable or help explain analogous measurements in a related measure. For example, redundant measures are often included in opinion surveys by using differently worded questions to solicit the same information. One question might ask “Did you like [something]?” and then a later question ask “Would you recommend [something] to your friends?” or “Would you use [something] again in the future?” to assess consistency in a respondent’s opinion about a product. Redundant variables can be a good check on data quality (https://statswithcats.wordpress.com/2010/09/19/it%E2%80%99s-all-in-the-technique/).
The Santa Claus Strategy
So make a list and check it twice.
Here’s a checklist you can use to help you think about your variables. Complete a checklist for each variable you plan to record. This may seem like a formidable amount of work, but it’s worth the effort. The checklist will help you think about your measurements, visualize how they will be generated, and ultimately produce results with less bias and variability. The checklists also provide concise documentation that can be added to a report appendix or project file. Furthermore, if you work with the same data often, you’ll find that completing such a checklist becomes much easier once you have thought through the process the first time. If this checklist doesn’t meet your needs, use it as a starting point to create your own. The important point is to think about what you plan to do.
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.com, barnesandnoble.com, or other online booksellers.