The Data Dozen

Data can take a variety of forms. Some are readily amenable to statistical analysis and some are better suited to other methods of analysis. When you’re trying to solve some problem or research question, though, you need to use whatever is available that fits. Here are twelve types of data to think about using in your next analysis.

Data Type

Description

Generation

Examples

Automatic Measurements

Information generated by devices, usually electronic or mechanical, that operate without human involvement (other than calibration and sample introduction). Experimenter-Device Thermocouples, strain-gage scales, electronic meters

Manual Measurements

Information generated by devices that require human involvement to carry out the measurement. Experimenter-Device Rulers, calipers, thermometers, balance-beam scales

Archived Records

Information generated by an identifiable person or organization Known individual or organization Government records, financial data, personal diaries, logs, notes

Directed Responses

Information receives as the result of a specific direct inquiry. Experimenter-Subject Surveys, focus groups, interrogations

Electronic Recordings

Information stored on audiovisual devices Experimenter-Device Videos, audio recordings, photos, false-color images

Metadata

Data about data— their origins, qualities, scales, and so on. Data Generator Time, location, and method of data generation

Transformations

Information created from other information. Data Analyst Percentages, sums, z-scores, ratios, and so on.

Analog Data

Information from a source that resembles in some respect a phenomenon under investigation Experimenter Experimental lab animals, models

First Person Reports

Descriptive, qualitative information derived from a first-person encounter Individual Eyewitness accounts

Secondhand Reports

Information summarized or retold by a second party based on first-person accounts. Known individual or organization News stories

Unverified Reports

Information, written or retold, which cannot be disproven or verified. Unknown individual or organization Anecdotes, stories, legends

Conjectures

Information created from thought experiments rather than physical experiments. Known individual or organization Expert opinions

Automatic and manual measurements are used commonly in statistical analysis when they can be generated in large numbers at reasonable costs. Furthermore, they are often measured on continuous, or at least, quantitative scales. These measurements are usually easy to reproduce but may be time or location dependent.

Data? I thought you said tuna.

Archive records are also used commonly in statistical analyses, usually as government records and financial data, when they are measured on quantitative scales. These data are often considered “official” because they have been verified even though they are not reproducible. Archive records may also provide qualitative information, usually in small amounts, such as personal diaries, logs, notes, and so on. These can be used to support statistical analyses and are a mainstay of scientific investigations.

Directed responses, information received as the result of specific questions, includes results of surveys and focus groups, which are commonly analyzes with statistics. Direct response data is also generated by direct and cross examinations in court and by military and law enforcement interrogations. Direct response data comes from individuals, so their responses may not always be true and consistent.

Two types of data that are used in almost all data analyses are metadata and transformations. Metadata are data about data, such as descriptions of their origins, qualities, scales, and so on. Transformations are data created from other data, which includes percentages, z-scores, sums, ratios, mathematical functions and so on (http://statswithcats.wordpress.com/2010/11/21/fifty-ways-to-fix-your-data/).

Analogs are data sources that substitute for the actual phenomenon of interest. Models are a type of analog as are animals used in medical experiments (much to their and my displeasure). Statistics is all about models (http://statswithcats.wordpress.com/2010/08/08/the-zen-of-modeling/), from basing test probabilities on the Normal distribution to creating regression models from data.

Electronic recordings, like videos and audio recordings, would seem to be a good type of data to analyze. Recordings have a great data density, though it can be laborious to extract individual data elements from the qualitative recording source. They can be faked, but so too can all the other types of data.

Reports come from witnesses. First person reports come from eyewitnesses. The information is typically descriptive, qualitative, and may be verifiable but typically isn’t reproducible and may not even be true. Secondhand reports are eyewitness reports that are summarized or retold by a second party, such as news agencies. Unverified reports, anecdotes, stories, and legends that may be written or retold, come from sources that are unknown. These reports usually cannot be disproven or verified. Reports don’t often provide data elements for statistical analyses but may provide supporting evidence or metadata.

Finally, conjectures are data produced by experts through thought experiments rather than physical experiments. The Delphi process (http://en.wikipedia.org/wiki/Delphi_method) is a good example of the use of conjecture. Usually conjecture is used in situations in which data cannot be collected, such as forecasting the future.

Did you see that?

Data analysts use all these data types. Statisticians want to use data types that provide many observations so they can assess variability. Scientists and engineers may be satisfied with the results of a single, albeit well controlled, experiment. They are truly deterministic breeds. Courts want every piece of evidence to be attested to by an individual, whether an eyewitness or an expert witness. They want to be able to cross-examine witnesses. Historians don’t usually have eyewitnesses so they rely on reports, especially secondhand and even unverified reports. They’ll use whatever they can find.

Certainly, this classification is not the only way to look at data. For example, the U.S. legal system defines courtroom evidence as either:

  • Real—physical objects, like a weapon.
  • Demonstrative—illustrations of evidence, like a map of the crime scene.
  • Documentary—items that contains human language, like contracts and newspaper articles.
  • Testimonial—oral or written evidence from witnesses.

(http://people.howstuffworks.com/inadmissible-evidence1.htm). To be admissible in court, these types of evidence have to be relevant (i.e., proves or disproves a fact), material (i.e., essential to the case), and competent (i.e., proven to be reliable). Trial lawyers use witnesses to tell compelling stories that will keep judges and juries attentive, which non-testimony evidence may not. In contrast, noted scientist and lecturer Neil deGrasse Tyson counters that “In courts, eyewitness testimony is considered great evidence. In science it’s considered worthless.” But that’s not quite true if the observation can be witnessed by others, such as in the cases of astronomical observations and replicated experiments. UFO eyewitnesses don’t fare so well with scientists. Statisticians want more, though. Our analyses aren’t based on cause-and-effect; association works just fine. But whatever you perspective on data, be sure you understand the pluses and minuses of what you’re working with.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.combarnesandnoble.com, or other online booksellers.

About these ads

About statswithcats

Charlie Kufs has been crunching numbers for over thirty years. He currently works as a statistician.
This entry was posted in Uncategorized and tagged , , , , , , , , , . Bookmark the permalink.

2 Responses to The Data Dozen

  1. Pingback: Five Things You Should Know Before Taking Statistics 101 | Stats With Cats Blog

  2. Pingback: HOW TO WRITE DATA ANALYSIS REPORTS. LESSON 1—KNOW YOUR CONTENT. | Stats With Cats Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s