You’ve probably heard the admonition:
Correlation Does Not Imply Causation.
Everyone agrees that correlation is not the same as causation. However, those two words — correlation and causation — have generated quite a bit of discussion.
Why Causality Matters
No one gets perturbed if you say two conditions or events are correlated but even suggest that causation is possible and you’ll get the clichéd admonition and perhaps with even harsher criticism. It’s not easy to prove causality, though, so there must be a reason for putting in the effort. For example, if you can figure out what causes a condition or event, you can:
- Promote the relationship to reap benefits, such as between agricultural methods and crop production or pharmaceuticals and recovery from illnesses.
- Prevent the cause to avoid harmful consequences, such as airline crashes and manufacturing defects.
- Prepare for unavoidable harmful consequences, such as natural disasters, like floods.
- Prosecute the perpetrator of the cause, as in law, or lay blame, as in politics.
- Pontificate about what might happen in the future if the same relationship occurs, such as in economics.
- Probe for knowledge based on nothing more than curiosity, such as how cats purr.
So how can you tell if correlation does in fact imply causation?
Criteria for Causality
Sometimes it’s next to impossible to convince skeptics of a causal relationship. Sometimes it’s even tough to convince your supporters. Developing criteria for causality has been a topic of concern in medicine for centuries. Several sets of criteria have been proffered over those years, the most widely cited of which are the criteria described in 1965 by Austin Bradford Hill, a British medical statistician. Hill’s criteria for causation specify the minimal conditions necessary to accept the likelihood of a causal relationship between two measures as:
- Strength: A relationship is more likely to be causal if the correlation coefficient is large and statistically significant.
- Consistency: A relationship is more likely to be causal if it can be replicated.
- Specificity: A relationship is more likely to be causal if there is no other likely explanation.
- Temporality: A relationship is more likely to be causal if the effect always occurs after the cause.
- Gradient: A relationship is more likely to be causal if a greater exposure to the suspected cause leads to a greater effect.
- Plausibility: A relationship is more likely to be causal if there is a plausible mechanism between the cause and the effect.
- Coherence: A relationship is more likely to be causal if it is compatible with related facts and theories.
- Experiment: A relationship is more likely to be causal if it can be verified experimentally.
- Analogy: A relationship is more likely to be causal if there are proven relationships between similar causes and effects.
These criteria are sound principles for establishing whether some condition or event causes another condition or event. No individual criterion is foolproof, however. That’s why it’s important to meet as many of the criteria as is possible. Still, sometimes causality is unprovable.
Three Steps to Decide if Correlation Implies Causation
Hill’s criteria can be thought of as aspects of the process of critical thinking or considerations in the scientific method or a model for deciding if a relationship involves causation. The criteria don’t all have to be met to suggest causality and some may not even be possible to meet in every case. The important point is to consider the criteria in a careful and unbiased process.
Step 1 — Check the Metrics
The admonition that correlation does not imply causation is used to remind everyone that a correlation coefficient may actually be characterizing a non-causal influence or association rather than a causal relationship. A large correlation coefficient does not necessarily indicate that a relationship is causal. On the other hand, saying that correlation is a necessary but not sufficient condition for causality, or in other words, causation cannot occur without correlation, is also not necessarily true. There are quite a few reasons for a lack of correlation.
So, before you get too excited about some causal relationship, make sure the correlation is statistically legitimate. You can’t assess the relationship’s gradient (i.e., the sign of the correlation coefficient) and strength (i.e., the value of the correlation coefficient) if the correlation is erroneous. Make sure to:
- Use metrics (variables) that are appropriate for quantifying the relationship. For example, don’t use an index that is a ratio of the other metric in the relationship.
- Use an appropriate correlation coefficient based on the scales of the relationship metrics.
- Confirm that the samples are representative of the population being analyzed and that the relationship is linear (or you are using non-linear methods for analysis).
- Make sure that there are no outliers or excessive uncontrolled variance.
The gradient of most causal relationships is positive. Inverse relationships will have a negative gradient. The strength of causal relationships could be almost anything; it depends on what you expect. If you don’t know what to expect, look at the square of the correlation coefficient, called the coefficient of determination, R-square, or R2. R-square is an estimate of the proportion of variance shared by two variables. It is used commonly to interpret the strength of the relationship between variables. Be aware, though, that even causal relationships may show smaller than expected correlations.
Step 2 — Explain the Relationship
If you are comfortable with the gradient and strength of the correlation coefficient, the next step is to define the pattern of the relationship. The correlation may not be of any help in exploring the pattern of the relationship because data plots for different patterns can look similar. Nonetheless, there’s no sense expending more effort if the correlation is in any manner suspect.
First, check for temporality in the data. If the cause doesn’t always precede the effect then either the relationship is a feedback relationship or is not causal. If cause and effect are not measured simultaneously, temporality may be obscured.
Next, try to determine what pattern of relationship is likely. This is not easy but it’s also not a permanent determination. If you are uncertain, start with either a direct or an inverse relationship, which can be determined from data plots. Then as you study the relationship further, you can assess whether the relationship may be based on feedback, common-source, mediation, stimulation, suppression, threshold, or multiple complexities.
Consider your relationship in terms of Hill’s criteria of Plausibility, Coherence, Analogy, and Specificity. Plausibility and Coherence are perhaps the easiest of the criteria to meet because it is all too easy to rationalize explanations for observed phenomenon. They may also rely on related facts and theories that can change over time. Analogy is a bit more difficult to meet but not impossible for a fertile mind. However, analogous relationships may appear to be similar but in fact be attributable to very different underlying mechanisms. Narrow minded people rely on Specificity in their arguments. Then again, relationships may have no other likely explanation because a phenomenon is not well understood.
Step 3 — Validate the Explanation
Perhaps the most important of Hill’s criteria are Experiment and Consistency. If you’re serious about proving there is a causal relationship between two conditions or events, you have to verify the relationship using an effective research design. Such an experiment usually requires a model of the relationship, a testable hypothesis based on the model, incorporation of variance control measures, collection of suitable metrics for the relationship, and an appropriate analysis. An appropriate analysis may be statistical (using multiple samples from a well-defined population and analyses like ANOVA to assess effects) or deterministic (using a representative example of a component of the relationship to demonstrate the effect). If the experiment verifies the relationship, especially if it can be consistently replicated by independent parties, there will be solid proof of causality and any spurious relationships will be disproved. The two problems are that this validation can involve considerable effort and that not every relationship can be verified experimentally.
There are two types of research studies — experimental and observational. In an experimental study, researchers decide what conditions the subjects (the entities being experimented on) will be exposed to and then measure variables of interest. In an observational study, researchers observe subjects that possess the conditions being assessed and then measure variables of interest. Both types of experimental designs have their challenges. Researchers may not be able to manipulate the conditions under study in an experiment because of cost, logistical, or ethical issues. Observational studies may be subject to confounding, conditions that interfere with the interpretation of results. Consequently, verifying that a relationship is causal is often easier said than done.
Hills criteria were developed for medicine. Medical research may start with anecdotal observations and progress to statistical observations of occurrence. Add demographics and patterns of occurrence may become apparent. Then the patterns are assessed to look for coherent, plausible explanations and analogues. Some medical hypotheses can be tested and analyzed statistically. Pharmaceutical effectiveness is an example. Psychological and agricultural relationships can often be tested. Other relationships can’t be manipulated so must be analyzed based on observations. Epidemiological studies are examples. Without being able to rely on the Experiment and Consistency criteria, causality can only be argued using the weaker Plausibility, Coherence, Analogy, and Specificity criteria. This is also true with natural phenomena, like landslides and earthquakes. Some conditions are unique or the underlying knowledgebase is insufficient to explain the phenomenon convincingly, so even the Plausibility, Coherence, Analogy, and Specificity criteria aren’t useful. Economic and political relationships often fall into this category.
So, if you hear someone claim that a relationship is causal, consider how Hill’s criteria might apply before you believe the assertion.
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at amazon.com, barnesandnoble.com, or other online booksellers.