Ten Ways Statistical Models Can Break Your Heart


Models are beautiful. The ways their features are combined sets them apart from each other.  Each has its own personality, sometimes pleasant, sometimes not, and often not what you would expect.

Here are ten ways your love affair with statistical models can end up on the rocks.

Relationship-Building Disasters

Modeling is more than just meeting a dataset on the internet and jumping into some R code together. You have to develop relationships with the data and everyone associated with them. For example:

  • Miscommunications. There are often quite a few people who have some stake in the model. They usually have different experiences and levels of understanding of modeling and, of course, different agendas for how the model will be treated. They won’t necessarily trust you. You have to try to keep them all happy and on the same page.
  • Interference. You may be doing all the heavy lifting with the data and the modeling but there are often individuals, like a boss, the client, or independent reviewers, who poke their fingers into your efforts.
  • Delays. You may feel under the gun to complete a modeling project but that doesn’t mean everyone associated with the project will share your constraints. You may be asked to redo the model every time new data become available, attend meetings, make presentations, and wait for decisions from upper management.
  • Skepticism. Not everyone is driven to make decisions after a careful analysis of relevant data. Some people prefer to rely on their gut feel. They may look at your model but then ignore those results and use their own intuition.
  • Indifference. On occasion, you might create a model, even what you consider a groundbreaking model, but nobody pays attention to it. Your model may be ignored for an inferior model, like an undrafted football player being benched in favor of a million-dollar bust. Or, people just don’t appreciate the importance of the model like you do. You’ll still need to get their acceptance.

Unrequited Models

No love 2You put your heart and soul into modeling the dataset but you get … NOTHING. No love in return. No matter how much you’ve planned, you can’t find a collection of independent variables that will adequately model your dependent variable. It happens to data analysts everywhere, all the time, for a variety of reasons. There may be non-linear relationships, outliers, or excessive uncontrolled variance. The variables may be inappropriate or inefficient.

What can you do?

First, you should reexamine the theory behind your model. Are your hypothesis and assumptions valid? Are your data suspect? Are the metrics you’re using as variables problematical? Are there latent concepts you could explore in a Factor Analysis? Do your samples need to be categorized in some way? Might conducting a Cluster Analysis provide insight?

Second, examine your correlations thoroughly. See if there are any transformations that might be helpful.

Third, if you have appropriate software, consider looking into nonlinear statistical regression, neural networks, and data mining solutions. Finally, there may be ways to construct probabilistic models, or models based on optimization procedures, or relative solutions from experts using a Delphi Method.

In the end:

Some models were not meant to be. If you can’t fit the model to the data, you have to be prepared to call it quits. In a way, this is equivalent to a Do Not Resuscitate order in medicine, and likewise, it can be a sensitive subject. It’s usually easier to create new variables or try some other statistical manipulation than it is to give the bad news, and the bill, to the client.

Muddled Models

MuddledSometimes models go wrong right out of the box because they are improperly specified. You may not be pursuing the relationship for the right reasons or in the right ways. For example:

  • The dependent or independent variables may be too expensive to collect. The model may even cost more to run than addressing the problem is worth.The dependent variable may not be actionable, at last not within the limits set by the client.
  • An independent variable might incorporate part of the dependent variable, if one or the other is a ratio.
  • The structure of the model may be wrong, for example, the model might be better as a multiplicative or other non-linear form instead of linear.

Wandering Eye Models

There are many different types of models, like fish in the sea. Some people are always looking for something better, even if what they have is pretty good.

Cross eyedFor example, you might have a good model but it’s not what the client expected. Perhaps the results are not what the client wanted to hear or the model may look good for general trends but not be an adequate representation of the phenomenon for extreme or special cases. He wants you to try over and bring him something better.

One concept that often confuses novice model builders are the differences between models aimed at prediction vs explanation. Explanatory models are based on theory. They need to incorporate independent variables that make theoretical or logical sense to be associated with the dependent variable. Prediction models don’t rely on theory. They need independent variables that produce large values of the Coefficient of Determination (r2) but low values of the Standard Error of Estimate (sxy or SEE). Explanatory models assume (or hope) that there are cause-effect relationships between the dependent variable and the independent variables; prediction models do not.

That’s where some clients balk if the model doesn’t have the variables they feel should be in a prediction model. It usually doesn’t matter if the model produces excellent predictions, they feel it would be better if their favorite variables were there … even though it wouldn’t.

It’s not just clients, though. There are times when model builders, especially young professionals, want to try out some new analytical breakthrough. The tried-and-true regression approach may produce results that are nearly as good, but the cutting edge model looks and sounds so much sexier. It’s seductive, and for some, hard to resist.

Deceptive Profile Models

Don’t you just hate it when you see something that isn’t at all the way it was described? “Hey, you should try analyzing this dataset. It’s a perfect match for you.” But then when you meet up, it’s nothing like you expected.

Guilty 3Maybe the expected population from which the data are drawn doesn’t really exist. Maybe the quality of the data is questionable or needs a lot of cleanup. Maybe the samples are biased or misleading.

And it’s not just what goes into a model that might be disappointing but also what comes out of modeling activities. The regression model itself might be improperly specified or misleading. Sometimes correctly specified models are poorly calibrated. Fortunately, there are also a variety of statistical diagnostics and plots that can be used to identify the problems.

Mercurial Models

Every measurement of a phenomenon includes characteristics of the population and natural variability as well as unwanted sampling variability, measurement variability, and environmental variability. You can’t understand your data unless you control extraneous variance attributable to the way you select samples, the way you measure variable values, and any influences of the environment in which you are working. If you plan to conduct a statistical analysis, you need to understand the three fundamental Rs of variance control — Reference, Replication, anLaptopd Randomization. Using the concepts of reference, replication and randomization, you can control, minimize, or at least be able to assess the effects of extraneous variability using: procedural controls; quality samples and measurements; sampling controls; experimental controls; and statistical controls.

Even after spending considerable effort trying to control extraneous variance in data collection, though, sometimes the models produced from them don’t share the precision. The models may have good accuracy, shown by large values of the Coefficient of Determination (r2) but low precision, shown by the large Standard Error of Estimate (sxy or SEE). You might have an accurate predictive model but it lacks enough precision to be useful. This is a surprisingly common occurrence. Some data analysts don’t seem to look past the r2. The sxy is ignored.

Look at any studies you can that involve predictive modeling. Do they discuss the uncertainty in the predictions? What do you think?

Run-away ModelsRun away

Sometimes you spend months and even longer getting to know your data and building a relationship only to have the model taken away. Maybe it’s a boss or more senior co-worker. Maybe it’s the client. You can chase after your model, keep up to speed with what’s happening in the model’s life, but that’s about it. There’s not much else you can do. It’s somebody else’s responsibility now.

Irreconcilable Difference Models

You and your model may reach a point where you might want to go to the next level in your relationship only to find there are differences you did not expect and can’t overcome. When you try to extend the relationship to new situations, everything fails. There are several possible reasons. Maybe you have a multi-level model. What worked for the samples you used doesn’t work when they are aggregated into higher level associations. Maybe you’re a victim of Simpson’s Paradox. What worked for the samples you used doesn’t work when they are separated into component groups. Then again, maybe it’s Fingersomething you did. Maybe your model is overfit. Perhaps you capitalized on chance and found associations that weren’t pervasive and lasting. The only thing you can do is reexamine the relationship and either start over or move on.

Marry or Break Up

There comes a time when you have to decide whether to commit to the effort to build a relationship or back out of the commitment. Maybe you don’t have enough samples. Maybe your goals don’t fit what the model needs. Perhaps the model is being asked to do something it wasn’t designed for. What works for describing a population may not be suited to describing individuals in the population. Then there might also be ethical issues to consider. But statisticians rarely get to make these decisions. If they accepted the assignment, the product belongs to the client.

Happily Never After

Deploying a model can sometimes change the behaviors of the population the model is based on. This is especially true when humans are involved; humans just love to game the rules. For example, if you develop a model for allocating resources, you can be assured that the potential recipients will do whatever it takes to increase their advantage. Once they do that, the model is no longer useful. That’s why models are often kept secret.


Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at amazon.com,  barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | Tagged , , , , , , , , , | 2 Comments

Looking for Insight through a Window

Black_cat_on_windowAt a press briefing on February 12, 2002, then Secretary of Defense Donald Rumsfeld addressed the absence of evidence linking the government of Iraq with weapons of mass destruction:

There are known knowns. There are things we know that we know. There are known unknowns. That is to say, there are things that we now know we don’t know. But there are also unknown unknowns. There are things we do not know we don’t know.

Now, despite the statement being a transparently irresponsible attempt to cover up a monumental failure in the collection and analysis of information or just a REALLY BIG LIE, the statement actually makes some sense. Similar words have been attributed to Confucius and others. But whether he realized it or not, Mr. Rumsfeld was describing a type of data analysis window.

Analytical windows are a type of matrix plot. Matrix plots are just grids for organizing information. The cells of a matrix plot can contain data, tables, graphs, or text. Windows consist of two criteria, or dimensions, defined by rows and columns. Each dimension usually has two categories, or levels, resulting in four cells, or panes. Rumsfeld’s window would look like this:


Things that We
Know Don’t Know
We Know Things that we know we know. Things that we don’t know we know.
We Don’t Know Things that we know we don’t know. Things that we don’t know we don’t know.

So, for example, a Rumsfeld Window could be used for planning a statistical study.

  • Things that we know we know would be things like background information on the study environment, the underlying theory on the phenomenon being explored, and the statistical characteristics of the population.
  • Things that we don’t know we know would be things like the statistical assumptions we make to perform the analysis — independence of observations, normality and homoscedasticity of errors.
  • Things that we know we don’t know would be things like the results of the research questions and test hypotheses we plan to focus on.
  • Things that we don’t know we don’t know would be things like the causes of outliers and other data and analysis anomalies.

The beauty of a window is the way it can organize sometimes complex information into simple binary categories. As a consequence, windows are used in many ways to analyze data.


Johari Windows

cat-window-1A Johari Window is a tool used by psychologists to help individuals and groups evaluate interpersonal communications. Its name comes from the first names of Joseph Luft and Harry Ingham, who created it in 1955. To use the window, subjects are told to pick five or six adjectives they feel describe their own personality from a standard list of 56 adjectives. Peers of the subject are then given the same standard list of 56 adjectives, and each pick five or six adjectives that describe the subject. These adjectives are then paced in the appropriate pane of the Johari Window.


Known to Self Not Known to Self

Known to Others



Not Known to Others



Johari windows were featured on a 2010 episode of the television series Fringe, which was seen by six million viewers, most of whom probably had no idea what they are.

images (1)Variance Windows

Windows can also be applied to planning how to control extraneous variance in the process of collecting data. If you plan to conduct a statistical analysis, you’ll need to understand the three fundamental Rs of variance control — Reference, Replication, and Randomization. Every measurement of a phenomenon includes characteristics of the population and natural variability as well as unwanted sampling variability, measurement variability, and environmental variability. You can’t understand your data unless you control extraneous variance attributable to the way you select samples, the way you measure variable values, and any influences of the environment in which you are working. Using the concepts of reference, replication and randomization, you can control, minimize, or at least be able to assess the effects of extraneous variability using: procedural controls; quality samples and measurements; sampling controls; experimental controls; and statistical controls.


Sources of Variance that we


Don’t Understand


Sampling and measurement variance

Sampling and measurement variance, environmental variance

Don’t Control

Natural variance

Sampling and measurement variance, environmental variance

To use a window to plan a variance control program, fill the panes of the window with all the sources of variability you can think of, categorized by how well you understand the source and think you can control it. Then identify a control measure for each source of variation.

Pick Charts

cat in windowA Pick Chart is a Lean Six Sigma tool for comparing difficulty of implementation (in terms of costs, effort, complexity, or time) to possible results (paybacks, returns, impacts, or improvements) for actions being considered. These two concepts serve as the axes of a data analysis window having four quadrants:

  • Possible“ideas that are considered “low hanging fruit”. The effort to implement is low, but the impact is also low. These should only be implemented after everything in the “Implement” quadrant.”
  • Implement“ideas that should be implemented as they will have a high impact and require low effort.”
  • Challenge“ideas that should be considered for implementation after everything in the “Implement” column. The impact is high, but the effort is also high.”
  • Kill“ideas that should be “killed” or not implemented. The effort to do so is high and the impact is low.”

Here’s an example involving the federal Employee Viewpoint Survey. In this pick chart, eighteen EVS question areas are compared according to:

  • Payoff from the actions being considered to improve EVS scores
  • Difficulty anticipated in successfully undertaking the actions.

Pick ChartPayoff was calculated (after scale adjustments) as the product of the score for a question and the decline in the scores from 2012 to 2014. Difficulty was based on: (1) who would have to be involved in implementing the change (i.e., many or few staff; in the main office or satellite offices; at staff, supervisor, or senior leader levels); (2) if existing programs or policies would be used or if they would have to be created; and (3) the funding required to implement the change. Payoff is based on actual EVS data so there is not much uncertainty. Difficulty is based on judgments concerning what generic actions might be taken to improve job satisfaction, so there is considerable uncertainty. Thus, the positions of the icons representing the EVS question areas are likely to shift horizontally, depending on the nature of specific projects being considered, but not vertically.

Performance Windows

154978-Cat-Watching-Rain-Out-WindowA performance window is a way to convey the results of a statistical test or classification. It is a table with two rows and two columns that summarize the number of correct classifications (true positives and true negatives), and the number of misclassifications (false positives and false negatives). This type of window is also called a confusion matrix, an error matrix, or a matching matrix.

Here are performance windows for classifications and statistical tests.

Predicted Classification



Actual Classification


Correct Classification


B Misclassification

Correct Classification

Statistical Test

Null hypothesis is not rejected

Null hypothesis is rejected

Actual Condition


Correct Inference False Positive – Type I Error


False Negative -Type II Error

Correct Inference

A contingency table is a type of matrix plot, frequently for more than two levels on the dimensions or even more than two dimensions, which summarizes the occurrence of data. They are also called cross tabulation ‎tables.

Windows on Scatter Plots

The concept of dividing areas of information into more understandable parts can be extended to scatter plots. Plots can be divided into quadrants, for example, using the means (or medians) of the data points for each axis. In essence, the window is overlain on the scatter plot. The window can be subdivided further by standard deviations (or quartiles).

English-Math 2

The performance window for this scatter plot would be:

Math Grade


Below Average

Above Average

English Grade

Above Average

9 18 27
Below Average 16 8


    25 26



Read more about using statistics at the Stats with Cats blog. Join other imagesfans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at amazon.com, barnesandnoble.com, or other online booksellers.





Posted in Uncategorized | Tagged , , , , , , , , , , , , | 2 Comments

It’s Hard to be a Data-Driven Organization

Why is it so Hard?

Should I follow the data or my instinctsDo you work for a data-driven organization, or one that claims to be a data-driven organization, or one that wants to be a data-driven organization? You probably do, whether you work for a big retailer or a small service provider. Every organization wants to believe that they use information to make decisions in an unbiased manner, although not every organization actually does that. It’s definitely not easy getting to be a real data-driven organization. At a minimum, an organization has to address five issues:

  • Funding. Being data-driven is a top-down decision because it must be supported by adequate funding. Without funding, all you can do is talk about how you’re data-driven. Talk is cheap; funding is commitment.
  • Data. Organizations should have standard processes that generate relevant business data of appropriate granularity and quality. There should be owners for each type of data who are responsible for the data quality, availability, and security. Small organizations can implement these concepts in less elaborate ways than large organizations. For example, one person may oversee all data operations in a small organization compared to a department of experts in a large organization. Even micro-sized organizations can have ready access to data. All it takes is an internet connection that allows searching for data and analyses others have posted.
  • IT Support. Generating, storing, accessing, analyzing, and reporting on data requires software and hardware resources, connectivity technologies, and communications capabilities. Again, one person can do everything or there can be a whole department of technicians supported by vendors and contractors. An organization just has to have enough consistently available support that it can rely on.
  • User Skillset. To be of any use, data has to be converted into information, and information into knowledge. One person can do everything but it’s better if there is a team of data scientists because no individual is likely to be familiar with all the different types of data analysis that might be appropriate. In an ideal situation, all employees would have some knowledge of data analysis techniques, even if it’s just a required statistics course they took in college. It’s easier to run a data-driven organization if everyone understands the roles data and business analytics have in their daily work and the organization’s objectives.
  • Decision-making Culture. The most important aspect of successful data-driven organizations is the attitudes of the individuals making decisions. If they would prefer to rely exclusively on their intuition to run their organizations, the organization won’t be data-driven no matter how much funding, data, support, and employee skills there are.

Why Do Some Individuals Avoid Data?

It may seem counterintuitive that some people avoid using data for their decision-making. They will guess, speculate, make assumptions, and argue for hours about matters that could be resolved quickly and convincingly by using data. They’ll follow hunches to decide what they want to do and then claim success based on little more than a few cherry-picked anecdotes. If you suggest looking at data, you might be asked “what do we need data for?” They’ll caution you against “information overload” and “paralysis by analysis.” They might tell you “that’s not what the big boss wants.” They’ll find all sorts of excuses. In the end, you can lead your boss to data but you can’t make him think.

Why do these people avoid collecting and analyzing data to address problems, especially in the current age of pervasive technological connectivity? There are a few possibilities.


Some people actually have a fear of information, possibly related to a fear of numbers (arithmophobia), technology (technophobia), computers (logizomechanophobia or cyberphobia), ideas (ideophobia), truth (alethephobia or veritaphobia), novelty (kainolophobia or kainophobia), or change (metathesiophobia). More likely, they might fear that they are incompetent to make a decision, perhaps associated with the Peter Principle. They might say “Let’s do it the way we did it before,” or “let’s not rock the boat.”


Some people just aren’t comfortable with numbers. Artists, for example, tend to be more comfortable with creative spatial and visual thinking compared to engineers who tend to be more comfortable with logical and quantitative thinking. Perhaps it’s a right-brain versus left brain phenomena, perhaps not. Think of how you make a major purchase. If you compare specifications and unit prices for each possible brand or model, going back and forth and back and forth, you’re what is called an analytical buyer. If you just buy the product in the red box because it has a picture of a cat on it that looks like one you own, you’re what is called an intuitive buyer. The same goes with decision-making. Some people trust their hunches more than they trust numbers.


What the heck am I doing?Some people aren’t accustomed to solving problems with data. They don’t know how to collect and analyze data. They wouldn’t even know where to start. They might talk to a few co-workers for anecdotal information but wouldn’t know how to generate representative data. They don’t know that data may already exist. They don’t understand how readily available some information is on the Internet. Even then, they wouldn’t know how to use data to make decision. They might defend themselves by saying available information is not actionable.


Some people just want to control everything they can. They might already have a preferred decision and don’t want any information that might call their hunch into question. Or, they may not know what they want to do but they don’t want any information that might limit their options or prevent them from controlling the debate. They may be control freaks. They may be subject to biases attributable to illusory superiority like the Dunning–Kruger effect.

How Can Reluctant Decision-Makers be Encouraged to be Data-Driven?

If you’re in an organization that is making the journey to being data-driven, changing the culture of decision-making will be your most formidable obstacle. The easiest problem to fix is ignorance. Training, encouragement, coaching and mentoring, and peer support combine to enlighten. The fears and inherent natures of some decision-makers are harder to address. Again, encouragement and personal support will encourage change. Control freaks are the most problematic. They are intransigent, as any of their exes will affirm. Don’t make them a focus of your efforts to change your decision-making culture. You’ll be disappointed.

Here are some actions you can take to support the adjustment.

If you work in upper management, the most important thing you can do is communicate your expectations and lead by example. Recognize that not every decision must be based on data. Sometimes data is just the starting point for a visionary leader’s intuition. Make funds available for actions that will support the initiative, like training in data analysis and decision-making. Require managers to at least bring data with them to the table when arguing their points. Challenge speculation. Help them through the process of incorporating information into their decision-making process by coaching and mentoring. Finally, recognize and reward staff members who take the lead in using data.

If you work in middle management, you’re probably the primary focus of the cultural change your company is trying to make. The most important thing you can do is accept the inevitability of the change and recognize you don’t have to do it all yourself. Communicate to your staff what things they can do to support the new decision-making strategy, like collecting and analyzing data. Approve funds for staff training and data collection/analysis activities. And again, recognize and reward staff members who take the lead in providing you with data.

If you work as a member of the staff, the most important thing you can do is collaborate with your co-workers in collecting and analyzing data. Help each other. Congratulate those who provide good examples of data collection, analysis, and reporting. And of course, take as much training as you can and use your initiative to interject data into activities you are working on.

downloadBe Patient

Changing an organization’s culture from intuition-based decision-making to data-driven decision-making is a long evolutionary process. It won’t happen by the end of next quarter, or next fiscal year, or for that matter, maybe ever. You won’t necessarily even know when you’ve achieved the goal. But, if you start to see that decisions work out better and are more defensible than in the past, you’re probably there. That’ll make everyone in the organization happier.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at amazon.com,  barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | Tagged , , , , , , , , , | 10 Comments

There’s a reason analysis begins with anal. Always evaluate the validity of your assumptions, your data scrubbing, and your interpretations. If you don’t, someone else will.

Posted in Uncategorized | Tagged , , , , | Leave a comment

Share Your Career with Students

I got a special request from my daughter in Hawaii that I hope you will read.

Aloha. I teach 5th grade special education in a resource room setting. My students are currently researching careers they are interested in as part of our expository writing unit. I’d love to have guest speakers come in and talk about jobs, but that’s tough to arrange, especially since there is so much confidentiality involved with the setting I’m in. Instead, I’d love to share letters written to them from people in different careers. My students are researching careers such as veterinarian, robotic engineer, biologist, Navy, Air Force, musician, fashion designer, and teacher, but I’d love a variety of careers to share with them.

If you are willing to type up a message to them, please include the following information

  1. Introduce yourself and your career.
  2. Explain the type of education/training you went through (you could mention what obstacles you encountered and how you overcame them (cost of school, a difficult class, etc.).
  3. Explain how the use of reading, writing, and math factors into your job and/or daily life.
  4. Close with what you enjoy about your career and some words of wisdom (optional)

Send your message to me at: mirandameow87@gmail.com  Include a picture you don’t mind me showing to my students when I read it (optional). THANKS! You will receive my eternal gratitude!

If you think you might want to share your career but are looking for ideas for starting, here’s what I wrote:

Aloha, my name is Charlie Kufs and I work as a Statistician for the for the United States government. My job is to take information, which we call data, and figure out how to use it to help the government run better. Statisticians also work for many other places like schools and companies. Most of the data statisticians work with are numbers that describe the things you buy in stores, the medicines you might take, the sports you play, and many more things. To be a statistician you have to love working with numbers.

To become a statistician, I had to complete elementary school, then four years of high school, and four years of college. I also studied two more years after college to learn more about math and statistics. As much as I loved learning about how to work with numbers, I also had to learn about reading and writing. Reading is very important to me because that’s how I learn new things. Even after going to school for almost twenty years, there are still many things to learn. I learn new things by reading books and articles on the Internet about statistics. Writing is just as important because I have to explain the work I’ve done to people who aren’t statisticians and don’t like numbers as much as I do. I’ve even written a book to help people work with statistics.

I really like working with numbers. Using math and statistics, I can solve very difficult problems at work and also have fun at home studying data about how I spend money, what foods I eat and exercising I do, and my favorite sports teams. If you like math and working with numbers, you might like to be a statistician when you get older.


Posted in Uncategorized | Tagged , | Leave a comment

How to Tell if Correlation Implies Causation

Critter growlYou’ve probably heard the admonition:

Correlation Does Not Imply Causation.

Everyone agrees that correlation is not the same as causation. However, those two words — correlation and causation — have generated quite a bit of discussion.

Why Causality Matters

No one gets perturbed if you say two conditions or events are correlated but even suggest that causation is possible and you’ll get the clichéd admonition and perhaps with even harsher criticism. It’s not easy to prove causality, though, so there must be a reason for putting in the effort. For example, if you can figure out what causes a condition or event, you can:

  • Promote the relationship to reap benefits, such as between agricultural methods and crop production or pharmaceuticals and recovery from illnesses.
  • Prevent the cause to avoid harmful consequences, such as airline crashes and manufacturing defects.
  • Prepare for unavoidable harmful consequences, such as natural disasters, like floods.
  • Prosecute the perpetrator of the cause, as in law, or lay blame, as in politics.
  • Pontificate about what might happen in the future if the same relationship occurs, such as in economics.
  • Probe for knowledge based on nothing more than curiosity, such as how cats purr.

So how can you tell if correlation does in fact imply causation?


Criteria for Causality

Sometimes it’s next to impossible to convince skeptics of a causal relationship. Sometimes it’s even tough to convince your supporters. Developing criteria for causality has been a topic of concern in medicine for centuries. Several sets of criteria have been proffered over those years, the most widely cited of which are the criteria described in 1965 by Austin Bradford Hill, a British medical statistician. Hill’s criteria for causation specify the minimal conditions necessary to accept the likelihood of a causal relationship between two measures as:

  1. Look right IMG_3861Strength: A relationship is more likely to be causal if the correlation coefficient is large and statistically significant.
  2. Consistency: A relationship is more likely to be causal if it can be replicated.
  3. Specificity: A relationship is more likely to be causal if there is no other likely explanation.
  4. Temporality: A relationship is more likely to be causal if the effect always occurs after the cause.
  5. Gradient: A relationship is more likely to be causal if a greater exposure to the suspected cause leads to a greater effect.
  6. Plausibility: A relationship is more likely to be causal if there is a plausible mechanism between the cause and the effect.
  7. Coherence: A relationship is more likely to be causal if it is compatible with related facts and theories.
  8. Experiment: A relationship is more likely to be causal if it can be verified experimentally.
  9. Analogy: A relationship is more likely to be causal if there are proven relationships between similar causes and effects.

These criteria are sound principles for establishing whether some condition or event causes another condition or event. No individual criterion is foolproof, however. That’s why it’s important to meet as many of the criteria as is possible. Still, sometimes causality is unprovable.

Three Steps to Decide if Correlation Implies Causation

Hill’s criteria can be thought of as aspects of the process of critical thinking or considerations in the scientific method or a model for deciding if a relationship involves causation. The criteria don’t all have to be met to suggest causality and some may not even be possible to meet in every case. The important point is to consider the criteria in a careful and unbiased process.

Step 1 — Check the Metrics

The admonition that correlation does not imply causation is used to remind everyone that a correlation coefficient may actually be characterizing a non-causal influence or association rather than a causal relationship. A large correlation coefficient does not necessarily indicate that a relationship is causal. On the other hand, saying that correlation is a necessary but not sufficient condition for causality, or in other words, causation cannot occur without correlation, is also not necessarily true. There are quite a few reasons for a lack of correlation.

So, before you get too excited about some causal relationship, make sure the correlation is statistically legitimate. You can’t assess the relationship’s gradient (i.e., the sign of the correlation coefficient) and strength (i.e., the value of the correlation coefficient) if the correlation is erroneous. Make sure to:

  • Use metrics (variables) that are appropriate for quantifying the relationship. For example, don’t use an index that is a ratio of the other metric in the relationship.
  • Use an appropriate correlation coefficient based on the scales of the relationship metrics.
  • Confirm that the samples are representative of the population being analyzed and that the relationship is linear (or you are using non-linear methods for analysis).
  • Make sure that there are no outliers or excessive uncontrolled variance.

The gradient of most causal relationships is positive. Inverse relationships will have a negative gradient. The strength of causal relationships could be almost anything; it depends on what you expect. If you don’t know what to expect, look at the square of the correlation coefficient, called the coefficient of determination, R-square, or R2. R-square is an estimate of the proportion of variance shared by two variables. It is used commonly to interpret the strength of the relationship between variables. Be aware, though, that even causal relationships may show smaller than expected correlations.

Step 2 — Explain the Relationship

If you are comfortable with the gradient and strength of the correlation coefficient, the next step is to define the pattern of the relationship. The correlation may not be of any help in exploring the pattern of the relationship because data plots for different patterns can look similar. Nonetheless, there’s no sense expending more effort if the correlation is in any manner suspect.

http://i.stack.imgur.com/aZX4a.pngFirst, check for temporality in the data. If the cause doesn’t always precede the effect then either the relationship is a feedback relationship or is not causal. If cause and effect are not measured simultaneously, temporality may be obscured.

Next, try to determine what pattern of relationship is likely. This is not easy but it’s also not a permanent determination. If you are uncertain, start with either a direct or an inverse relationship, which can be determined from data plots. Then as you study the relationship further, you can assess whether the relationship may be based on feedback, common-source, mediation, stimulation, suppression, threshold, or multiple complexities.

Patterns of relationshipsConsider your relationship in terms of Hill’s criteria of Plausibility, Coherence, Analogy, and Specificity. Plausibility and Coherence are perhaps the easiest of the criteria to meet because it is all too easy to rationalize explanations for observed phenomenon. They may also rely on related facts and theories that can change over time. Analogy is a bit more difficult to meet but not impossible for a fertile mind. However, analogous relationships may appear to be similar but in fact be attributable to very different underlying mechanisms. Narrow minded people rely on Specificity in their arguments. Then again, relationships may have no other likely explanation because a phenomenon is not well understood.

Step 3 — Validate the Explanation

Perhaps the most important of Hill’s criteria are Experiment and Consistency. If you’re serious about proving there is a causal relationship between two conditions or events, you have to verify the relationship using an effective research design. Such an experiment usually requires a model of the relationship, a testable hypothesis based on the model, incorporation of variance control measures, collection of suitable metrics for the relationship, and an appropriate analysis. An appropriate analysis may be statistical (using multiple samples from a well-defined population and analyses like ANOVA to assess effects) or deterministic (using a representative example of a component of the relationship to demonstrate the effect). If the experiment verifies the relationship, especially if it can be consistently replicated by independent parties, there will be solid proof of causality and any spurious relationships will be disproved. The two problems are that this validation can involve considerable effort and that not every relationship can be verified experimentally.

There are two types of research studies — experimental and observational. In an experimental study, researchers decide what conditions the subjects (the entities being Lewontin quoteexperimented on) will be exposed to and then measure variables of interest. In an observational study, researchers observe subjects that possess the conditions being assessed and then measure variables of interest. Both types of experimental designs have their challenges. Researchers may not be able to manipulate the conditions under study in an experiment because of cost, logistical, or ethical issues. Observational studies may be subject to confounding, conditions that interfere with the interpretation of results. Consequently, verifying that a relationship is causal is often easier said than done.

 Implying Causality

Hills criteria were developed for medicine. Medical research may start with anecdotal observations and progress to statistical observations of occurrence. Add demographics and patterns of occurrence may become apparent. Then the patterns are assessed to look for coherent, plausible explanations and analogues. Some medical hypotheses can be tested and analyzed statistically. Pharmaceutical effectiveness is an example. Psychological and agricultural relationships can often be tested. Other relationships can’t be manipulated so must be analyzed based on observations. Epidemiological studies are examples. Without being able to rely on the Experiment and Consistency criteria, causality can only be argued using the weaker Plausibility, Coherence, Analogy, and Specificity criteria. This is also true with natural phenomena, like landslides and earthquakes. Some conditions are unique or the underlying knowledgebase is insufficient to explain the phenomenon convincingly, so even the Plausibility, Coherence, Analogy, and Specificity criteria aren’t useful. Economic and political relationships often fall into this category.

So, if you hear someone claim that a relationship is causal, consider how Hill’s criteria might apply before you believe the assertion.


Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at amazon.com,  barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | Tagged , , , , , , , , | 11 Comments

2014 in review

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.

Here’s an excerpt:

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 83,000 times in 2014. If it were an exhibit at the Louvre Museum, it would take about 4 days for that many people to see it.

Click here to see the complete report.

Posted in Uncategorized | Leave a comment