It’s Hard to be a Data-Driven Organization

Why is it so Hard?

Should I follow the data or my instinctsDo you work for a data-driven organization, or one that claims to be a data-driven organization, or one that wants to be a data-driven organization? You probably do, whether you work for a big retailer or a small service provider. Every organization wants to believe that they use information to make decisions in an unbiased manner, although not every organization actually does that. It’s definitely not easy getting to be a real data-driven organization. At a minimum, an organization has to address five issues:

  • Funding. Being data-driven is a top-down decision because it must be supported by adequate funding. Without funding, all you can do is talk about how you’re data-driven. Talk is cheap; funding is commitment.
  • Data. Organizations should have standard processes that generate relevant business data of appropriate granularity and quality. There should be owners for each type of data who are responsible for the data quality, availability, and security. Small organizations can implement these concepts in less elaborate ways than large organizations. For example, one person may oversee all data operations in a small organization compared to a department of experts in a large organization. Even micro-sized organizations can have ready access to data. All it takes is an internet connection that allows searching for data and analyses others have posted.
  • IT Support. Generating, storing, accessing, analyzing, and reporting on data requires software and hardware resources, connectivity technologies, and communications capabilities. Again, one person can do everything or there can be a whole department of technicians supported by vendors and contractors. An organization just has to have enough consistently available support that it can rely on.
  • User Skillset. To be of any use, data has to be converted into information, and information into knowledge. One person can do everything but it’s better if there is a team of data scientists because no individual is likely to be familiar with all the different types of data analysis that might be appropriate. In an ideal situation, all employees would have some knowledge of data analysis techniques, even if it’s just a required statistics course they took in college. It’s easier to run a data-driven organization if everyone understands the roles data and business analytics have in their daily work and the organization’s objectives.
  • Decision-making Culture. The most important aspect of successful data-driven organizations is the attitudes of the individuals making decisions. If they would prefer to rely exclusively on their intuition to run their organizations, the organization won’t be data-driven no matter how much funding, data, support, and employee skills there are.

Why Do Some Individuals Avoid Data?

It may seem counterintuitive that some people avoid using data for their decision-making. They will guess, speculate, make assumptions, and argue for hours about matters that could be resolved quickly and convincingly by using data. They’ll follow hunches to decide what they want to do and then claim success based on little more than a few cherry-picked anecdotes. If you suggest looking at data, you might be asked “what do we need data for?” They’ll caution you against “information overload” and “paralysis by analysis.” They might tell you “that’s not what the big boss wants.” They’ll find all sorts of excuses. In the end, you can lead your boss to data but you can’t make him think.

Why do these people avoid collecting and analyzing data to address problems, especially in the current age of pervasive technological connectivity? There are a few possibilities.

confused-catFear

Some people actually have a fear of information, possibly related to a fear of numbers (arithmophobia), technology (technophobia), computers (logizomechanophobia or cyberphobia), ideas (ideophobia), truth (alethephobia or veritaphobia), novelty (kainolophobia or kainophobia), or change (metathesiophobia). More likely, they might fear that they are incompetent to make a decision, perhaps associated with the Peter Principle. They might say “Let’s do it the way we did it before,” or “let’s not rock the boat.”

Nature

Some people just aren’t comfortable with numbers. Artists, for example, tend to be more comfortable with creative spatial and visual thinking compared to engineers who tend to be more comfortable with logical and quantitative thinking. Perhaps it’s a right-brain versus left brain phenomena, perhaps not. Think of how you make a major purchase. If you compare specifications and unit prices for each possible brand or model, going back and forth and back and forth, you’re what is called an analytical buyer. If you just buy the product in the red box because it has a picture of a cat on it that looks like one you own, you’re what is called an intuitive buyer. The same goes with decision-making. Some people trust their hunches more than they trust numbers.

Ignorance

What the heck am I doing?Some people aren’t accustomed to solving problems with data. They don’t know how to collect and analyze data. They wouldn’t even know where to start. They might talk to a few co-workers for anecdotal information but wouldn’t know how to generate representative data. They don’t know that data may already exist. They don’t understand how readily available some information is on the Internet. Even then, they wouldn’t know how to use data to make decision. They might defend themselves by saying available information is not actionable.

Control

Some people just want to control everything they can. They might already have a preferred decision and don’t want any information that might call their hunch into question. Or, they may not know what they want to do but they don’t want any information that might limit their options or prevent them from controlling the debate. They may be control freaks. They may be subject to biases attributable to illusory superiority like the Dunning–Kruger effect.

How Can Reluctant Decision-Makers be Encouraged to be Data-Driven?

If you’re in an organization that is making the journey to being data-driven, changing the culture of decision-making will be your most formidable obstacle. The easiest problem to fix is ignorance. Training, encouragement, coaching and mentoring, and peer support combine to enlighten. The fears and inherent natures of some decision-makers are harder to address. Again, encouragement and personal support will encourage change. Control freaks are the most problematic. They are intransigent, as any of their exes will affirm. Don’t make them a focus of your efforts to change your decision-making culture. You’ll be disappointed.

Here are some actions you can take to support the adjustment.

If you work in upper management, the most important thing you can do is communicate your expectations and lead by example. Recognize that not every decision must be based on data. Sometimes data is just the starting point for a visionary leader’s intuition. Make funds available for actions that will support the initiative, like training in data analysis and decision-making. Require managers to at least bring data with them to the table when arguing their points. Challenge speculation. Help them through the process of incorporating information into their decision-making process by coaching and mentoring. Finally, recognize and reward staff members who take the lead in using data.

If you work in middle management, you’re probably the primary focus of the cultural change your company is trying to make. The most important thing you can do is accept the inevitability of the change and recognize you don’t have to do it all yourself. Communicate to your staff what things they can do to support the new decision-making strategy, like collecting and analyzing data. Approve funds for staff training and data collection/analysis activities. And again, recognize and reward staff members who take the lead in providing you with data.

If you work as a member of the staff, the most important thing you can do is collaborate with your co-workers in collecting and analyzing data. Help each other. Congratulate those who provide good examples of data collection, analysis, and reporting. And of course, take as much training as you can and use your initiative to interject data into activities you are working on.

downloadBe Patient

Changing an organization’s culture from intuition-based decision-making to data-driven decision-making is a long evolutionary process. It won’t happen by the end of next quarter, or next fiscal year, or for that matter, maybe ever. You won’t necessarily even know when you’ve achieved the goal. But, if you start to see that decisions work out better and are more defensible than in the past, you’re probably there. That’ll make everyone in the organization happier.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at amazon.com,  barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | Tagged , , , , , , , , , | 8 Comments

There’s a reason analysis begins with anal. Always evaluate the validity of your assumptions, your data scrubbing, and your interpretations. If you don’t, someone else will.

Posted in Uncategorized | Tagged , , , , | Leave a comment

Share Your Career with Students

I got a special request from my daughter in Hawaii that I hope you will read.

Aloha. I teach 5th grade special education in a resource room setting. My students are currently researching careers they are interested in as part of our expository writing unit. I’d love to have guest speakers come in and talk about jobs, but that’s tough to arrange, especially since there is so much confidentiality involved with the setting I’m in. Instead, I’d love to share letters written to them from people in different careers. My students are researching careers such as veterinarian, robotic engineer, biologist, Navy, Air Force, musician, fashion designer, and teacher, but I’d love a variety of careers to share with them.

If you are willing to type up a message to them, please include the following information

  1. Introduce yourself and your career.
  2. Explain the type of education/training you went through (you could mention what obstacles you encountered and how you overcame them (cost of school, a difficult class, etc.).
  3. Explain how the use of reading, writing, and math factors into your job and/or daily life.
  4. Close with what you enjoy about your career and some words of wisdom (optional)

Send your message to me at: mirandameow87@gmail.com  Include a picture you don’t mind me showing to my students when I read it (optional). THANKS! You will receive my eternal gratitude!

If you think you might want to share your career but are looking for ideas for starting, here’s what I wrote:

Aloha, my name is Charlie Kufs and I work as a Statistician for the for the United States government. My job is to take information, which we call data, and figure out how to use it to help the government run better. Statisticians also work for many other places like schools and companies. Most of the data statisticians work with are numbers that describe the things you buy in stores, the medicines you might take, the sports you play, and many more things. To be a statistician you have to love working with numbers.

To become a statistician, I had to complete elementary school, then four years of high school, and four years of college. I also studied two more years after college to learn more about math and statistics. As much as I loved learning about how to work with numbers, I also had to learn about reading and writing. Reading is very important to me because that’s how I learn new things. Even after going to school for almost twenty years, there are still many things to learn. I learn new things by reading books and articles on the Internet about statistics. Writing is just as important because I have to explain the work I’ve done to people who aren’t statisticians and don’t like numbers as much as I do. I’ve even written a book to help people work with statistics.

I really like working with numbers. Using math and statistics, I can solve very difficult problems at work and also have fun at home studying data about how I spend money, what foods I eat and exercising I do, and my favorite sports teams. If you like math and working with numbers, you might like to be a statistician when you get older.

20150222_230034

Posted in Uncategorized | Tagged , | Leave a comment

How to Tell if Correlation Implies Causation

Critter growlYou’ve probably heard the admonition:

Correlation Does Not Imply Causation.

Everyone agrees that correlation is not the same as causation. However, those two words — correlation and causation — have generated quite a bit of discussion.

Why Causality Matters

No one gets perturbed if you say two conditions or events are correlated but even suggest that causation is possible and you’ll get the clichéd admonition and perhaps with even harsher criticism. It’s not easy to prove causality, though, so there must be a reason for putting in the effort. For example, if you can figure out what causes a condition or event, you can:

  • Promote the relationship to reap benefits, such as between agricultural methods and crop production or pharmaceuticals and recovery from illnesses.
  • Prevent the cause to avoid harmful consequences, such as airline crashes and manufacturing defects.
  • Prepare for unavoidable harmful consequences, such as natural disasters, like floods.
  • Prosecute the perpetrator of the cause, as in law, or lay blame, as in politics.
  • Pontificate about what might happen in the future if the same relationship occurs, such as in economics.
  • Probe for knowledge based on nothing more than curiosity, such as how cats purr.

So how can you tell if correlation does in fact imply causation?

http://xkcd.com/552/

Criteria for Causality

Sometimes it’s next to impossible to convince skeptics of a causal relationship. Sometimes it’s even tough to convince your supporters. Developing criteria for causality has been a topic of concern in medicine for centuries. Several sets of criteria have been proffered over those years, the most widely cited of which are the criteria described in 1965 by Austin Bradford Hill, a British medical statistician. Hill’s criteria for causation specify the minimal conditions necessary to accept the likelihood of a causal relationship between two measures as:

  1. Look right IMG_3861Strength: A relationship is more likely to be causal if the correlation coefficient is large and statistically significant.
  2. Consistency: A relationship is more likely to be causal if it can be replicated.
  3. Specificity: A relationship is more likely to be causal if there is no other likely explanation.
  4. Temporality: A relationship is more likely to be causal if the effect always occurs after the cause.
  5. Gradient: A relationship is more likely to be causal if a greater exposure to the suspected cause leads to a greater effect.
  6. Plausibility: A relationship is more likely to be causal if there is a plausible mechanism between the cause and the effect.
  7. Coherence: A relationship is more likely to be causal if it is compatible with related facts and theories.
  8. Experiment: A relationship is more likely to be causal if it can be verified experimentally.
  9. Analogy: A relationship is more likely to be causal if there are proven relationships between similar causes and effects.

These criteria are sound principles for establishing whether some condition or event causes another condition or event. No individual criterion is foolproof, however. That’s why it’s important to meet as many of the criteria as is possible. Still, sometimes causality is unprovable.

Three Steps to Decide if Correlation Implies Causation

Hill’s criteria can be thought of as aspects of the process of critical thinking or considerations in the scientific method or a model for deciding if a relationship involves causation. The criteria don’t all have to be met to suggest causality and some may not even be possible to meet in every case. The important point is to consider the criteria in a careful and unbiased process.

Step 1 — Check the Metrics

The admonition that correlation does not imply causation is used to remind everyone that a correlation coefficient may actually be characterizing a non-causal influence or association rather than a causal relationship. A large correlation coefficient does not necessarily indicate that a relationship is causal. On the other hand, saying that correlation is a necessary but not sufficient condition for causality, or in other words, causation cannot occur without correlation, is also not necessarily true. There are quite a few reasons for a lack of correlation.

So, before you get too excited about some causal relationship, make sure the correlation is statistically legitimate. You can’t assess the relationship’s gradient (i.e., the sign of the correlation coefficient) and strength (i.e., the value of the correlation coefficient) if the correlation is erroneous. Make sure to:

  • Use metrics (variables) that are appropriate for quantifying the relationship. For example, don’t use an index that is a ratio of the other metric in the relationship.
  • Use an appropriate correlation coefficient based on the scales of the relationship metrics.
  • Confirm that the samples are representative of the population being analyzed and that the relationship is linear (or you are using non-linear methods for analysis).
  • Make sure that there are no outliers or excessive uncontrolled variance.

The gradient of most causal relationships is positive. Inverse relationships will have a negative gradient. The strength of causal relationships could be almost anything; it depends on what you expect. If you don’t know what to expect, look at the square of the correlation coefficient, called the coefficient of determination, R-square, or R2. R-square is an estimate of the proportion of variance shared by two variables. It is used commonly to interpret the strength of the relationship between variables. Be aware, though, that even causal relationships may show smaller than expected correlations.

Step 2 — Explain the Relationship

If you are comfortable with the gradient and strength of the correlation coefficient, the next step is to define the pattern of the relationship. The correlation may not be of any help in exploring the pattern of the relationship because data plots for different patterns can look similar. Nonetheless, there’s no sense expending more effort if the correlation is in any manner suspect.

http://i.stack.imgur.com/aZX4a.pngFirst, check for temporality in the data. If the cause doesn’t always precede the effect then either the relationship is a feedback relationship or is not causal. If cause and effect are not measured simultaneously, temporality may be obscured.

Next, try to determine what pattern of relationship is likely. This is not easy but it’s also not a permanent determination. If you are uncertain, start with either a direct or an inverse relationship, which can be determined from data plots. Then as you study the relationship further, you can assess whether the relationship may be based on feedback, common-source, mediation, stimulation, suppression, threshold, or multiple complexities.

Patterns of relationshipsConsider your relationship in terms of Hill’s criteria of Plausibility, Coherence, Analogy, and Specificity. Plausibility and Coherence are perhaps the easiest of the criteria to meet because it is all too easy to rationalize explanations for observed phenomenon. They may also rely on related facts and theories that can change over time. Analogy is a bit more difficult to meet but not impossible for a fertile mind. However, analogous relationships may appear to be similar but in fact be attributable to very different underlying mechanisms. Narrow minded people rely on Specificity in their arguments. Then again, relationships may have no other likely explanation because a phenomenon is not well understood.

Step 3 — Validate the Explanation

Perhaps the most important of Hill’s criteria are Experiment and Consistency. If you’re serious about proving there is a causal relationship between two conditions or events, you have to verify the relationship using an effective research design. Such an experiment usually requires a model of the relationship, a testable hypothesis based on the model, incorporation of variance control measures, collection of suitable metrics for the relationship, and an appropriate analysis. An appropriate analysis may be statistical (using multiple samples from a well-defined population and analyses like ANOVA to assess effects) or deterministic (using a representative example of a component of the relationship to demonstrate the effect). If the experiment verifies the relationship, especially if it can be consistently replicated by independent parties, there will be solid proof of causality and any spurious relationships will be disproved. The two problems are that this validation can involve considerable effort and that not every relationship can be verified experimentally.

There are two types of research studies — experimental and observational. In an experimental study, researchers decide what conditions the subjects (the entities being Lewontin quoteexperimented on) will be exposed to and then measure variables of interest. In an observational study, researchers observe subjects that possess the conditions being assessed and then measure variables of interest. Both types of experimental designs have their challenges. Researchers may not be able to manipulate the conditions under study in an experiment because of cost, logistical, or ethical issues. Observational studies may be subject to confounding, conditions that interfere with the interpretation of results. Consequently, verifying that a relationship is causal is often easier said than done.

 Implying Causality

Hills criteria were developed for medicine. Medical research may start with anecdotal observations and progress to statistical observations of occurrence. Add demographics and patterns of occurrence may become apparent. Then the patterns are assessed to look for coherent, plausible explanations and analogues. Some medical hypotheses can be tested and analyzed statistically. Pharmaceutical effectiveness is an example. Psychological and agricultural relationships can often be tested. Other relationships can’t be manipulated so must be analyzed based on observations. Epidemiological studies are examples. Without being able to rely on the Experiment and Consistency criteria, causality can only be argued using the weaker Plausibility, Coherence, Analogy, and Specificity criteria. This is also true with natural phenomena, like landslides and earthquakes. Some conditions are unique or the underlying knowledgebase is insufficient to explain the phenomenon convincingly, so even the Plausibility, Coherence, Analogy, and Specificity criteria aren’t useful. Economic and political relationships often fall into this category.

So, if you hear someone claim that a relationship is causal, consider how Hill’s criteria might apply before you believe the assertion.

http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/100000/40000/4000/200/144270/144270.strip.gif

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at amazon.com,  barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | Tagged , , , , , , , , | 6 Comments

2014 in review

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.

Here’s an excerpt:

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 83,000 times in 2014. If it were an exhibit at the Louvre Museum, it would take about 4 days for that many people to see it.

Click here to see the complete report.

Posted in Uncategorized | Leave a comment

Reading Stats with Cats

Reading Stats with  Cats

Todd P. Chang bonding with his cat while reading Stats with Cats together.

Posted in Uncategorized | Tagged | 1 Comment

Types and Patterns of Data Relationships

Types of Data Relationships

Posted in Uncategorized | Tagged , , , , , , , , , , , , | 1 Comment