## Why You Don’t Always Get the Correlation You Expect

If you’ve ever taken a statistics class on correlation, you’ve probably come to expect that a large value for a correlation coefficient, either positive or negative, means that there is a noteworthy relationship between two phenomena. This is not always the case. Furthermore, a small correlation may not always mean there is no relationship between the phenomena.

## Correlation Size Doesn’t Always Matter

That’s not what I expected.

A small correlation coefficient does not necessarily imply a lack of a relationship any more than a large correlation coefficient implies a strong relationship. It depends on the type of relationship and the data used to characterize it. This is important because analysts often devote all their time investigating large correlations while disregarding relationships that have small correlation coefficients, especially if they don’t have an expectation for what a good value might actually be.

## Statistical Reasons for Small Correlations

There are several statistical reasons for unexpected correlations:

• Non-linear relationships — Correlation coefficients assume that the relationship between two variables is linear. Nonlinear relationships result in smaller than expected correlation coefficients. A scatterplot of the variables can usually confirm this problem, which can often be corrected with a data transformation.
• Outliers — The strength of a correlation coefficient can be deflated or inflated by outliers. A scatterplot can usually confirm the presence of outliers although deciding how to treat them may be more problematical.
• Excessive uncontrolled variance — Sometimes, data points that appear to be outliers may just be instances of excess variance. Excess variance is probably the most common cause of smaller than expected correlations. Usually, excess variance is the result of a lack of adequate control in data generation.
• Inappropriate sample — Data points that look like outliers or excess variance may be sham samples. Sham samples are not representative of the population being analyzed, and so, confound any calculated statistics. Samples may also represent trends hidden in subpopulations, perhaps even resulting in Simpson’s paradox.
• Inefficient metricsVariables used in the analysis may not be appropriate for investigating the phenomenon in question. As a consequence, the strength of a relationship will be smaller than expected.

That’s why any evaluation of a correlation should include looking at the coefficient’s sign and size, a scatterplot of the relationship, and a statistical test of significance. There’s a lot more information in data relationships than can be expressed by a single statistic.

If there are no statistical issues with a dataset, it’s important to also consider what types of relationships between the variables are possible.

## Relationship Reasons for Small Correlations

### Types of Relationships

When analysts see a large correlation coefficient, they begin speculating about possible reasons. They’ll naturally gravitate toward their initial hypothesis (or preconceived notion) which set them to investigate the data relationship in the first place. Because hypotheses are commonly about causation, they often begin with this least likely type of relationship. Besides causation, relationships can also reflect influence or association:

• Causes — A cause is a condition or event that directly triggers, initiates, makes happen, or brings into being another condition or event. A cause is a sine qua non; without a cause a consequent will not occur. Causes are directional. A cause must precede its consequent.
• We’re influences.

Influences — An influence is a condition or event that changes the manifestation of an existing condition or event. Influences can be direct or mediated by a separate condition or event. Influences may exist at any time before or after the influenced condition or event. Influences may be unidirectional or bidirectional.

• Associations — Associations are two conditions or events that appear to change in a related way. Any two variables that change in a similar way will appear to be associated. Thus, associations can be spurious or real. Associations may exist at any time before or after the associated condition or event. Unlike causes and influences, associated variables have no effect on each other and may not exist in different populations or in the same population at different times or places.

Associations are commonplace. Most observed correlations are probably just associations. Influences and causes are less common but, unlike associations, they can be supported by the science or other principles on which the data are based. The strength of a correlation coefficient is not related to the type of relationship. Causes, influences, and associations can all have strong as well as weak correlations depending on the efficiency of the variables being correlated and the pattern of the relationship.

### Patterns of Relationships

Most discussions of correlation and causation focus on the simple direct relationship that one event or condition, designated as A, is related to a second event or condition, designated as B. For example, gravitational forces from the Moon and Sun cause ocean tides on the Earth. A causes B but B does not cause A. Another direct relationship is that age influences height and weight. Age doesn’t cause height and weight but we tend to grow larger as we age so A influences B.

Direct relationships are easy to understand and, if there are no statistical obfuscations, should exhibit a high degree of correlation. In practice, though, not every relationship is direct or simple. Here are eight:

• Shields up!  ………………  Phasers locked on target.

Feedback Relationship: A and B are linked in a loop; A causes or influences B which then causes or influences A and so on. For example, poor performance in school or at work (A) creates stress (B) which degrades performance further (A) leading to more stress (B) and so on.

• Common Relationship: A third event or condition, C, causes or influences both A and B. For example, hot weather (C) causes people to wear shorts (A) and drink cool beverages (B). Wearing shorts doesn’t cause or influence beverage consumption, although the two are associated by their common cause. Another example is the influence obesity has on susceptibility to a variety of health maladies.
• Mediated Relationship: A causes or influences C and C causes or influences B so that it appears that A causes B. For example, rainy weather (A) often induces people to go to their local shopping mall for something to do (C). While there, they shop, eat lunch, and go to the movies and other entertainment venues thus providing the mall with increased revenues (B). In contrast, snowstorms (A) often induce people to stay at home (C) thus decreasing mall revenues (B). Bad weather doesn’t cause or influence mall revenues directly but does influence whether people visit the mall.
• Stimulated Relationship: A causes or influences B but only in the presence of C. There are many examples of this pattern, such as metabolic and chemical reactions involving enzymes or catalysts.
• Suppressed Relationship: A causes or influences B but not in the presence of C. For example, pathogens (A) cause infections (B) but not in the presence of antibiotics (C). Some drugs (A) cause side effects (B) only in certain at-risk populations (C).
• We’re an inverse relationship.

Inverse Relationship: The absence of A causes or influences B. For example, vitamin deficiencies (A) cause or influence a wide variety of symptoms (B).

• Threshold Relationship: A causes or influences B only when A is above a certain level. For example, rain (A) causes flooding (B) only when the volume or intensity is very high.
• Complex Relationship: Many A factors or events contribute to the cause or influence of B. Numerous environmental processes fit this pattern. For example, A variety of atmospheric and astronomical factors (A) contribute to influencing climate change (B).

### Spurious Relationships

There are also a variety of spurious relationships in which A appears to cause or influence B but does not. Often the reason is that the relationship is based on anecdotal evidence that is not valid more generally. Sometimes spurious relationships may be some other kind of relationship that isn’t understood. Here are five other reasons why spurious relationships are so common.

• Misunderstood relationships: The science behind a relationship may not be understood correctly. For example, doctors used to think that spicy foods and stress caused ulcers. Now, there is greater recognition of the role of bacterial infection. Likewise, hormones have been found to be the leading cause of acne rather than diet (i.e., consumption of chocolate and fried foods).
• Misinterpreted statistics: There are many examples of statistical relationships being interpreted incorrectly. For example, the sizes of homeless populations appear to influence crime. Then again, so do the numbers of museums and the availability of public transportation. All of these factors are associated with urban areas.
• We’re a spurious relationship.

Misinterpreted observations: Incorrect reasons are attached to real observations. Many old wives tales are based on credible observations. For example, the notion that hair and nails continue to grow after death is an incorrect explanation for the legitimate observation.

• Urban legends: Some urban legends have a basis in truth and some are pure fabrications, but they all involve spurious relationships. For example, In South Korea, it believed that sleeping with a fan in a closed room will result in death.
• Biased Assertions: Some spurious relationships are not based on any evidence but instead are claimed in an attempt to persuade others of their validity. For example, the claim that masturbation makes you have hairy palms is not only ludicrous but also easily refutable. Likewise, almost any advertisement in support of a candidate in an election contains some sort of bias, such as cherry picking.

## Correlation and Causation — Not Always What You Expect

Calculated correlation coefficients are innocent bystanders in debates over causation, influence, and association. With all the statistical and relational nuances that can affect their interpretation, it’s a wonder that they are so often used alone as determinants of causality and influence. As with all statistics, correlation coefficients need to be interpreted in the context provided by other types information. Certainty correlation does not imply causation, but according to Edward Tufte and others, sometimes it’s a good hint.

Well, that’s not what I expected.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at amazon.com,  barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | | 4 Comments

## O.U..T…L….I……E……..R………………..S

Occasionally, a dataset may contain a value that is far greater (or less) than, or doesn’t display the same characteristics as the other values. This anomalous value is termed an influential observation. If the influential observation is not representative of the population being sampled, it is called an outlier.

## Causes

Influential observations and outliers occur for a variety of reasons. Some are straightforward data generation or reporting errors. Lab results that are off by a factor of ten are often identified as outliers. Outliers occur in business data for a variety of reasons. Reporting deadlines may be missed, weather or construction may prevent customers from shopping, and there may be one-time corrections for past errors.

I’m being an outlier.

Sometimes, there are deterministic influences that skew some measurements. For example, aberrant measurements may be caused by instrument error or miscalibration. Some outliers aren’t errors but instead are the result of inherent variability or a natural cause. So, if you run into outliers, try to figure out why they exist. They may mean nothing so that you can delete them from the analysis, or they may be critical to your interpretation of a dataset. You’ll probably find that, most of the time, the causes of outliers will be unknown.

## Identification

Influential observations and outliers are generally not difficult to detect. Sorting and listing the data will often reveal questionable values, though the best way to identify a potential outlier is by graphing the data. Histograms, box plots, probability plots, time-series plots, or scatter plots of the data will usually reveal any aberrant values.

Graphs are particularly effective in identifying three patterns of outliers:

• Cross-trend Outliers. Cross-trend outliers lie a substantial distance away from the rest of the data in positions that do not fall on the trend of the data. As a consequence, they can substantially reduce R2, inflate variances, and change regression model coefficients. They are usually easy to identify in graphs, however, their cause is usually difficult to ascertain.
• In-Trend Outliers. In-trend outliers lie a substantial distance away from the rest of the data in positions that do fall on the trend of the data. Like cross-trend outliers, they are usually easy to identify in graphs. They can substantially inflate R2 but not change regression equations, which leads some analysts to include the outlier despite evidence of questionable validity. Their cause is often easy to ascertain because of the unique conditions the outlier represents.
• Fringe Outliers. Fringe outliers lie a relatively small distance away from the rest of the data in positions that parallel the trend of the data. They are not always easy to identify in graphs. They can deflate R2 and change regression equations. Their cause is usually difficult to ascertain but may be the result of some bias in the data collection.

Three Patterns of Outliers.

There are many statistical tests for identifying outliers. Outlier tests follow one of several strategies. Deviation/spread tests are like simple t-tests. They are calculated as the difference between the outlier value and the mean (or other measure of central tendency), divided by the standard deviation (or other measure of data dispersion). Excess/spread tests, also called Dixon-type tests, are calculated as the difference between the outlier and the next closest value (or other observation in the dataset), and the dataset range (or other dispersion statistic). Some statisticians prefer this type of approach because it is not necessary to have good estimates of the mean and variance. Other outlier tests examine sums-of-squares, skewness, and location relative to the center of the dataset.

The truth of the matter is that outlier tests are often superfluous. If you can see it in a graph, the test will usually confirm what you see. Tests are often convenient for convincing reviewers that what you think is an outlier, really is. If you can’t see it in a graph but an outlier test is significant, it may be an outlier … or not. The real issue, in most cases, is what you do if you find a value you think is an outlier.

## Treatment

There are five options for treating outliers:

• Inclusion — Inclusion involves keeping the outlier in the dataset. This approach would make sense to use if you’re looking to assess the effects of the anomalies. Sometimes you’re forced to take this approach because an unenlightened reviewer thinks you are trying to “pull something.” In cases like this, it might be beneficial to run your analyses both with and without the outlier so that everyone can understand its effect.
• Correction — Correction involves changing the outlier to the correct value. This doesn’t happen often. You might find an outlier to be an error but you can’t correct it because you don’t know what the true value should be. In that case, deletion is probably a better option. If you’re lucky, though, you might find an outlier to be an error and be able to correct it.
• Replacement — Replacement involves changing the outlier to a contingency value. This approach is like the replacement options for missing data. Using the mean or median in place of an outlier will bias the dataset, but not nearly as much as the outlier. This is often the best approach to use for complex statistical calculations.
• Accommodation — Accommodation involves keeping the outlier in the dataset but using “robust” statistical procedures that are less sensitive to outliers. Nonparametric statistics are often used for this purpose.
• Deletion — Deletion is simply removing the outlier from the dataset. This approach would make sense if you’re looking to assess general trends. Once again, it might be beneficial to run your analyses both with and without the outlier.

The option you select should depend on whether you believe the aberrant observation is representative of the population you are investigating. Your objective and the type of analysis you plan to do will also be considerations in this decision.

## What Should You Do

Am I an Outlier?

If a statistical graphic or an outlier test suggests that a data value may be an influential observation or an outlier, follow these steps:

1. Examine a variety of graphical depictions of all the data points including box plots, probability plots, bivariate plots, time-series plots, and contour maps to assess possible reasons for the aberrant observation.
2. Review notes and metadata concerning the sample or measurement to determine if any irregularities in the sampling or data collection processes may be responsible for the discordant value.
3. Review documentation related to data quality for the sample or measurement to determine if any irregularities in the collection, packaging, transport, and analysis or measurement and recording processes may be responsible for the discordant concentration.
4. If any information indicates that the sample is probably not representative of the population being sampled, consider the sample or measurement to be an outlier and replace or delete it from further analysis. If possible, collect a new sample or measurement.
5. If any information indicates that the sample should be representative of the population, review results for other measurements from the same source to determine if other results support the legitimacy of the suspected outlier. Also, review results for the same variable that may have been generated during previous sampling efforts.
6. If prior results for the variable or results for other variables are consistent with results for the suspect sample or measurement, retain the value and evaluate it as an influential observation.
7. If prior results for the parameter or results for other parameters are not consistent with results for the suspect sample or measurement, consider the value to be an outlier and replace or delete it from further analysis. If possible, collect a new sample or measurement.

This procedure works best if both data analysts and reviewers can somehow be involved in the examination process. Be sure to document all findings and decisions during this process.

If you decide to retain the outlier, consider using a nonparametric alternative to the procedure you planned to conduct. If for some reason this is not feasible, consider analyzing the dataset twice, once with the outlier and once without the outlier. Caveat your conclusions on the basis of the outlier and recommend collecting additional samples or measurements to assess its validity. Consultants always recommend additional work anyway, so this should come as no surprise to either clients or reviewers.

If you are assessing data trends, you will probably want to delete or replace any outliers. Even a single outlier can mask significant trends. Be aware however, that this action could bias predicted values and the prediction error if the cause of the outlier is natural.

Given the choice to replace or delete an outlier, consider the number of samples you have and the importance of the variable the outlier is a measure of. Remember, if you delete the outlier you will end up having to delete either the sample or the variable to conduct your statistical analysis. If the variable is important and you don’t have many samples, consider replacing the outlier.

There is also a psychological component to consider when replacing or deleting outliers. Scientists and engineers are taught that it is unethical to delete or change data that might not fit with their expectations. Outliers challenge that notion. Statisticians and reviewers become highly suspicious of each other when the need to judge an outlier arises. Consequently, it is sensible to have a procedure for evaluating outliers in place that everyone agrees to before the need arises. Even so, somebody will criticize you no matter what you do. It’s the way things work.

This idea for this title came from: Beckman, R.J. and Cook, R.D. (1983). Outlier. . ..s, Technometrics, 25, 119-163. If you don’t get the joke, don’t worry about it.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at amazon.com,  barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | | 2 Comments

## Stats with Cats Blog: 2013 in review

The WordPress.com stats helper monkeys prepared a 2013 annual report for the Stats with Cats Blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 58,000 times in 2013. If it were a concert at Sydney Opera House, it would take about 21 sold-out performances for that many people to see it.

Welcome Poofygrey Nod and Black Magic to the Stats with Cats family.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmarkamazon.com,  barnesandnoble.com, or other online booksellers.

## How to Write Data Analysis Reports. Lesson 6—Get Acceptance.

Data analysis reports have to go through one more hurdle after they are completely written. They have to be approved for acceptance by a gatekeeper. The approval for acceptance may involve allowing report distribution, starting the publishing process, issuing payment for your services, or just acknowledging that your work is done. The gatekeeper may be your client, your supervisor, your publisher, or for blog writers, you. To get that approval, formal reports usually have to be reviewed by reviewers. Reviewers are usually individuals the gatekeeper chooses based on their technical background or role in the gatekeeper’s organization. Sometimes, reviewers are individuals the gatekeeper is forced to listen to, like regulatory reviewers. In academic publishing, you may not even know who the peer reviewers are.

Logically, the acceptance review shouldn’t take too long compared to the time you took to analyze the data and write the report. After all, the reviewers only have to read it. In practice, though, reviews take far longer than report preparation. The report you wrote in a month may take six months to be reviewed. Don’t panic. It’s just the way things seem to happen.

The number of comments you get from the reviewers is inconsequential. Great reports can get dozens of highly critical comments. Again, don’t panic. The only review you should be concerned about is the one that provides no comments. That usually signals a lack of interest by the reviewers and the gatekeeper.

When the review is complete, be sure to get the comments in writing. If you don’t, some comments may be forgotten or misunderstood. If there is more than one reviewer, compile all the comments together. This is essential because sometimes reviewers provide conflicting comments. The gatekeeper may compile the comments for you if he or she wants to control the process. The comments should be placed in the order they correspond to in the report. Be sure to identify the source of each comment. If a single comment has many parts, break the comment apart so you can respond to each part individually.

Then comes the challenging part—you must respond to each comment separately. Create a new document listing all the compiled comments. For each comment in this document, either describe what you’ll do in response or explain why you won’t make any changes. Start with the easy comments, such as those involving grammar and spelling. As you describe your response to a comment in the document, make the associated change in the report. Proceed through increasingly more difficult comments until you are done. For very complex comments, try to parse the ideas and respond to each separately. If a particular comment is very difficult to address, you may have to conduct additional analyses or information research. Cite information sources if appropriate.

When you’re done, reread both the response document and the changes in the report. Be sure all the changes were made in the report and that they are consistent with the rest of the report. Also, make sure the tone of your response is even; be stoic.

Don’t get upset by reviewers pointing out flaws in your report. That’s what they’re supposed to do. Having been on both sides of the writer/reviewer divide, I can tell you that creating a report takes a hundred times more knowledge, creativity, effort, and time than reviewing a report. Providing constructive criticism on a report requires a hundred times more experience, situational awareness, and interpersonal sensitivity than creating a report. Good writing combined with constructive reviewing makes a data analysis report the best it can be.

You can read the entire six-part blog, albeit with fewer cats, at How to Write Data Analysis Reports in Six Easy Lessons.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmarkamazon.com,  barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | | 2 Comments

## How to Write Data Analysis Reports. Lesson 5—Get It Done.

Perhaps the hardest part of writing a data analysis report is just getting it completed. It takes discipline and persistence to stay on track. Even so, it’s easy to get distracted. Sometimes the problem is that the story of the analysis hasn’t been thought all the way through. Sometimes there are gaps in the analysis that necessitate stopping to complete more calculations. Sometimes there are too many interruptions and distractions to maintain focus. Sometimes, the process of writing becomes boring and requires a great effort to continue.

Writer’s block is an impediment experienced by all writers. Writer’s block might be attributable to not knowing what to write next, trying to write text that is perfect, or fear of failure. Any of these reasons may be applicable to the report writer. Here are ten ways to fight off writer’s block.

1. Stick with a routine. Keep writing even if you are dissatisfied with what you’ve written. You can, and should, edit your draft after you’re done. Try to identify your productivity tipping point. For some people, accomplishing a specific goal by a certain time in a day helps ensure the rest of your day is productive. For example, my productivity tipping point is beginning to write by 8AM. If I do, I’ll be writing productively all day.

2. Visualize. If you’ve never used visualization techniques before, now is a good time to develop the skill. The idea is to close your eyes, get relaxed, and think about what you want to do or see. Start by visualizing what the next few sentences you have to write might look and sound like. Eventually, you’ll be able to visualize what paragraphs, sections, and even the entire final product will look like.

3. Eschew perfection. If it’s not perfect the first time you write it, leave it alone. Let it age while you write the rest of the report. You can reevaluate and rewrite it later when you know more about the rest of the report.

4. Write in parallel. Some parts of reports, like introductions and summaries, and descriptions of variables and other details, are almost formulaic. Write all the similar parts at the same time. Set up a second file in your word processing software to serve as a staging area for the repeated parts. Then, copy and paste the standardized parts to your report and edit the text as appropriate.

5. Grow the outline. Instead of trying to write the report section by section, try using the outline as a template rather than a map. Add key phrases, instructions, notes, sentences, and even paragraphs to the template-outline. You can skip around the template-outline as you come up with ideas for what to write. Eventually, you can consolidate these ideas into paragraphs and then sections. Continue to expand the template-outline until it ultimately becomes the complete report.

6. Tiptoe through the tables. Create all or most of your graphics (i.e., tables and figures) before starting to write. Lay the graphics out in your word processing software and write the text that would go with each graphic. Then, go back and fill in the gaps between graphics. Continue joining the pieces until the report is complete.

7. Chunk it up. Don’t try to write the entire report by yourself. Break it up into pieces and get help.

8. Set deadlines. Sometimes it helps to be able to work towards an interim goal. Set deadlines for sections or other tasks you have to accomplish. Make them challenging but achievable.

9. Give it a rest. Absence makes the mind grow sharper. Consider taking some time off from report writing, but make sure you use the time productively. Schedule that colonoscopy you’ve been putting off. Clean the garage and paint the house. Visit your in-laws. Don’t just play video games or watch Netflix.

10. Do something different. If your routine isn’t working, try doing something different. If you can’t get anywhere because you’re pressing, work on something else or take some time off. If you can’t get anywhere because you’re slacking, try researching. If you can’t get anywhere because you’re stuck on writing, pull together graphics or the appendices. If you can’t get anywhere because you’re procrastinating, ask yourself why.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmarkamazon.com,  barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | | 1 Comment

## How to Write Data Analysis Reports. Lesson 4—Get Their Attention.

If you’re writing a report about statistics, you have to expect that many readers will lose interest after a while, if they even had it to begin with. So, in writing the report, think about how you might engage your audience. Here are five ideas.

• Find Common Ground.  Every relationship begins with having something in common. Fighting a common foe or solving a common problem can form the strongest and longest lasting of bonds. So the first thing you should try to establish in your report is that common ground. This isn’t so difficult if you are working on an analysis at the behest of a client. The client is already immersed in the data and has invested in you to help solve the problem. Establishing common ground is not so easy if you are proffering an uninvited message. Some people, perhaps subconsciously, don’t really want the message you are offering, especially when you’re analyzing data in their area of expertise. Try to establish common ground in other areas. Perhaps your analysis touches on a similar or analogous issue the reader might have. Maybe the analysis procedure could be used on a different problem the reader might have.
• Clear the Decks. Get rid of everything that doesn’t add to the progression of the report. That doesn’t necessarily mean you have to omit the content. You can relegate it to an appendix, which is pretty much the same thing. Unless required to be in the body of the report, things like the data, data collection surveys and forms, and scrubbing and analysis procedures should all be put in an appendix.
• Set the Tone. Your writing style can either add to or detract from the readability of your report. A formal tone, with strict adherence to grammar rules, complex sentence structures, use of third-person point-of-view and passive voice, and plentiful jargon, is appropriate for most data analysis reports. Formal tones are good for describing details, specifications, and step-by-step instructions. However, formal tones can be more difficult to understand, especially for individuals not accustomed to reading technical reports. An informal tone, with simple grammar and vocabulary, colloquialisms, contractions, analogies, and humor, works well for blogs. Informal tones are good for discussing ideas and concepts, and for inspiring readers or communicating a vision. They are more engaging and tend to be easier for most individuals to understand. If you’re being paid to write the report, a formal tone is usually more appropriate. This is problematical, of course, because formal writing is usually harder to read and maintain an interest in.

• Make it Better. Just when you think you’re done writing, you’re not. That’s the time when you have to do even more to make the report better. First, take some time off if you can. Then, read it through again making improvements along the way. Read it aloud if you need to, even record it when you read it aloud and then play it back so you can engage both your vision and hearing. Consider getting a second opinion, especially if you can’t distance yourself from the report by setting it aside for a few days. A second opinion may come from a data analysis peer, but don’t ignore nontechnical editors. A good editor can help with spelling, grammar, punctuation, word choice, style and tone, formatting, references, and accessibility. It’s usually worth the effort. This is the time to go for perfection.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmarkamazon.com,  barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | | 1 Comment

## How to Write Data Analysis Reports. Lesson 3—Know Your Route.

You’ve been taught since high school to start with an outline. Nothing has  changed with that. However, there are many possible outlines you can follow depending on your audience and what they expect. The first thing you have to decide is what the packaged report will look like.

Will your report be an executive brief (not to be confused with a legal brief), a letter report, a summary report, a comprehensive report, an Internet article or blog, a professional journal article, or a white paper to name a few. Each has its own types of audience, content, and whiting style. Here’s a summary of the differences.

Writing a report is like taking a trip. The message is the asset you want to deliver to the ultimate destination, the audience. The package is the vehicle that holds the message. Now you need a map for how to reach your destination. That’s the outline.

Just as there are several possible routes you could take with a map, there are several possible outline strategies you could use to write your report. Here are six.

• The Whatever-Feels-Right Approach. This is what inexperienced report writers do when they have no guidelines. They do what they might have done in college or just make it up as they go along. This might work out just fine or be as confusing as The Maury Show on Father’s Day. Considering that the report involves statistics, you can guess which it would be.
• The Historical Approach. This is another approach that inexperienced report writers use. They do what was done the last time a similar report was produced. This also might work out fine. Then again, the last report may have been a failure, ineffective in communicating its message.
• The “Standard” Approach. Sometimes companies or organizations have standard guidelines for all their reports, even requiring the completion of a formal review process before the report is released. Many academic and professional journals use such a prescriptive approach. The results may or may not be good, but at least they look like all the other reports.
• The Military Approach. You tell ‘em what you’re going to tell ‘em, you tell ‘em, and then you tell ‘em what you told ‘em. The military approach may be redundant and boring, but some professions live by it. It works well if you have a critical message that can get lost in details.
• The Follow-the-Data Approach. If you have a very structured data analysis it can be advantageous to report on each piece of data in sequence. Surveys often fall into this category. This approach makes it easy to write the report because sections can be segregated and doled out to other people to write, before being reassembled in the original order. The disadvantage is that there usually is no overall synthesis of the results. Readers are left on their own to figure out what it all means.
• The Tell-a-Story Approach. This approach assumes that reading a statistical report shouldn’t be as monotonous as mowing the lawn. Instead, you should pique the reader’s curiosity by exposing the findings like a murder mystery, piece by piece, so that everything fits together when you announce the conclusion. This is almost the opposite of the follow-the-data approach. In the tell-a-story approach, the report starts with the simplest data analyses and builds, section by section, to the great climax—the message of the analysis. Analyses that are not relevant to the message are omitted. There are usually arcs, in which a previously introduced analytical result is reiterated in subsequent sections to show how it supports the story line. Graphics are critical in this approach; outlines are more like storyboards. There may be the equivalent of one page of graphics for every page of text. Telling a story usually takes longer to write than the other approaches but the results are more memorable if your audience has the patience to read everything (i.e., don’t try to tell a story to a Bypasser.)

So, be sure that you have an appropriate outline but don’t let it constrain you. Having a map doesn’t mean you can’t change your route along the way, you just need to get to the destination. In building the outline, try to balance sections so the reader has periodic resting points. Within each section, though, make the lengths of subsections correspond to their importance.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmarkamazon.com,  barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | | 1 Comment