## Limits of Confusion

Cat whiskers are like confidence intervals. They let the cat know how big it’s spread is.

A confidence interval is the numerical interval around the mean of a sample from a population that has a certain confidence of including the mean of the entire population. “Say what?” OK, let’s take it one point at a time.

Say you collect 30 water samples from a lake. Oh wait. That use of the word sample will be confusing to some people. A sample is a portion of a population, but the word can refer to an individual piece of a population or a collection of pieces of the population (https://statswithcats.wordpress.com/2010/07/03/it’s-all-greek/). It’s like the word fish—one fish, two fish, school of fish, and so on.

Anyway, say you collect 30 aliquots (i.e., samples) of water from the lake and analyze the aliquots for iron content. Then, you sum the 30 iron concentrations and divide by 30 to get the mean iron concentration of your collection of aliquots (i.e., sample). But you don’t really care about the mean iron concentration of your sample of 30 samples collection of 30 aliquots. What you want to know is the average iron concentration of all the water in the lake. No problem. You can use the mean iron concentration of the 30 aliquots as an approximation of the mean iron concentration of the lake (population).

Now, that would be fine for most people except for neurotic individuals who don’t understand the Central Limit Theorem. These persons have a couple of options. They can go back to the lake and collect 30 more aliquots of water (this is sometimes referred to as a working vacation if the collection of fish samples is also involved), then recalculate the mean, and see what they get. They can do the same thing again, and again, and again (referred to as a vacation if the consumption of beer and potato chips is involved, https://statswithcats.wordpress.com/2010/07/26/samples-and-potato-chips/) until they have enough means to say how variable the lake’s mean iron concentration might be. (Note: If the neurotic individuals can get someone else to pay for everything, they are called consultants. If the neurotic individuals can get everyone else to pay for everything, they are called politicians.)

For people who can’t afford to collect more samples of samples, there’s an alternative approach called resampling. It’s the computer equivalent of a cushy government contract for data collection. In a resampling approach, you would collect the 30 aliquots of lake water, analyze them for iron content, and calculate the mean of your sample. Then you would have specialized software randomly select a certain number of the original 30 samples (the process is called bootstrapping or jackknifing depending on how it’s done; feel free to google away) to create a new dataset, from which you could calculate a new mean iron concentration. Then do that again, and again, and again until you have enough means to say how variable the mean iron concentration is.

A third alternative, which involves no fishing, less computer time, and as much beer as you need, is to calculate a confidence interval. First, calculate the mean and standard deviation of the 30 iron concentrations. Then calculate a confidence interval around the sample mean using the formula

Sample Mean ± Sample Standard deviation divided by square root of the Number of Samples times a t-value

In the lake example, the mean, standard deviation and number of samples would be calculated from the iron concentrations determined in the aliquots of lake water. The t-value would be calculated using software or selected from a table of values of the t-distribution on the basis of:

Degrees-of-freedom. The number of samples minus one. In this case, 30 water aliquots minus 1 equals 29.

Alpha. One minus the confidence that you won’t find any estimates of the mean outside the interval you calculate divided by the number of limits you will calculate, in this case, two because you want upper and lower limits.

The boundaries of a confidence interval are called the upper confidence limit and the lower confidence limit.

For example, if:

• Mean iron concentration were 50
• Standard deviation of iron concentration were 10
• t-value for 29 degrees-of-freedom (based on 30 iron concentrations) and alpha of .005 (based on 99% confidence for a two-sided limit) were 3.04

the 99% lower confidence limit would be 44.45 (i.e., 50 – (3.04 * (10/30)) and the 95% upper confidence limit would be 55.55 (i.e., 50 + (3.04 * (10/30))

You would have about 99% confidence that this interval would include the mean iron concentration of the lake.

But what if you think 44 to 56 is too wide a range for the lake’s mean iron concentration. What can you do? You could go back to the lake and collect another 30 samples and try again. Better yet, you could go back to the lake and take 120 or even more samples https://statswithcats.wordpress.com/2010/07/11/30-samples-standard-suggestion-or-superstition/), but that’s a lot of expensive work vacation.

Look back at the formula for the confidence limits. The limits are calculated from the mean, the standard deviation, the number of samples, and the t-value. If you’re not going back to the lake, you can’t change the mean, the standard deviation, or the number of samples. That leaves the t-value. The t-value would be based on the degrees-of-freedom and the confidence. The degrees-of-freedom are determined from the number of samples, so that’s still no help. BUT, the choice of the confidence is yours.

Consider this. If you choose the confidence level to be:

99%, the confidence limit would be 44.45 to 55.55
95%, the confidence limit would be 45.68 to 54.32
90%, the confidence limit would be 46.27 to 54.32

Or for that matter,

50%, the confidence limit would be 47.86 to 52.14

although it wouldn’t be very useful if your interval only had a 50% chance of including the real mean iron concentration of the lake.

Consider the analogy of a nearsighted man playing a ring-toss game at a carnival. The location of the peg he will toss his ring at is like the mean of a population of possible measurements. The diameter of the peg is like the inherent variability of the population of measurements. The fuzziness with which he sees the peg because of his near sightedness is like the additional variation associated with sampling, measurement, and environmental variability (https://statswithcats.wordpress.com/2010/08/01/there%E2%80%99s-something-about-variance/). The size of the ring he tosses is like the size of the confidence interval. If he wanted to be very confident that he could toss a ring over the peg, he would use a large ring to give him that confidence (i.e., the higher the confidence the larger the confidence interval).

The man cannot change the location and diameter of the peg (i.e., the population values are fixed). However, he would have a greater chance of success if he could see better (i.e., extraneous variation in the samples is controlled, https://statswithcats.wordpress.com/2010/09/05/the-heart-and-soul-of-variance-control/; https://statswithcats.wordpress.com/2010/09/19/it%E2%80%99s-all-in-the-technique/) or if he could use a very large ring (i.e., a relatively wide confidence interval). If the ring (the confidence interval) becomes too large, though, the game becomes meaningless. Thus, there must be some limits on how large the ring should be.

Obsidian in a 90% confidence drawer.

By convention, most statistical inferences, including confidence intervals, use a 95% confidence level. Sometimes either a 90% level (resulting in a smaller confidence interval) or a 99% level (resulting in a larger interval) is used. A 90% level would be more appropriate when the consequences of not including the true population value in the interval are relatively minor. Confirmatory inferences, on the other hand, often use a 99% confidence level. When in doubt, use 95%.

Some people dislike putting confidence limits around means they calculate. Limits show how imprecise data, and statistics calculated from them, actually are. But if you are going to make an informed decision, you have to know not just the evidence, but the reliability of the evidence as well. Maybe that’s why lawyers hate to have statisticians sitting in the jury pool.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.combarnesandnoble.com, or other online booksellers.

Charlie Kufs has been crunching numbers for over thirty years. He retired in 2019 and is currently working on Stats with Kittens, the prequel to Stats with Cats.
This entry was posted in Uncategorized and tagged , , , , , , , , , . Bookmark the permalink.

### 8 Responses to Limits of Confusion

1. excellent
thanks a lot

2. Thank you.

3. Charlie Young says:

In the example cited, don’t the upper and lower confidence limits of the mean based the student’s t statistic require that the data are drawn from a normally distributed population?

• Alan says:

I agree with Charlie. Isn’t that a prerequisite?

4. Well … yes and no.

The central limit theorem says that the distribution of means from samples of a population will be normally distributed no matter what the frequency distribution of the individuals in the population the samples were originally drawn from. So if the iron concentrations in the lake are lognormally distributed, estimates of the mean from samples of the lake will still be normally distributed. That’s the CLT.

BUT …

As with almost every damn thing in statistics, the CLT applies to large samples; it doesn’t work well with small sample sizes. But if the distribution of the iron concentrations in the lake were normally distributed, then sample size doesn’t matter so much.

So what’s a sufficiently large sample size? Who knows, maybe 30 or 100 or 500. It’s all a matter of resolution. As sample size increases, values for the t-distribution approach values for the normal distribution. Yes, t-values will give you a wider interval but they also give you some protection for nonnormality of the sampling distribution.

The prerequisite, then is a large number of samples or a normally-distributed population. Pick your Poisson poison. If your situation doesn’t fit one of those, it doesn’t mean the interval is a total fail, it will probaby just be a bit too wide (if you use t-values).

5. Pingback: Polls Apart | Stats With Cats Blog

6. Gabby Kohlb says:

Hi StatswithCats,
I like the analogy with the water samples out of the lake, and that you just can’t take all the water out of the lake to be sure you have the right iron concentration.
This is just what I needed as an example for my non-statistician colleagues why the Confidence Intervals are such a help.
The usual explanation that CI tells you were your “true” mean will fall into, was of no help…
Thank you so much!!!
Gabby