If you do much data analysis it won’t be long before you work with data measured over a range of times. When you do see time-series data, you’ll find that time scales and time units have some very quirky properties.
Time after Time
You might think that time is measured on a ratio scale given its ever finer divisions (i.e., hours, minutes, seconds). Yet it doesn’t make sense to refer to a ratio of two times any more than the ratio of two location coordinates. The starting point is also arbitrary. So time clearly isn’t measured on a ratio scale but it can be measured on interval or ordinal scales. Time units are also used for durations; however durations can be measured on a ratio scale. Durations can be used in ratios and they have a starting point of zero.
Time measurements can be linear or cyclic. Year is linear, and can be measured on either an interval scale or an ordinal scale. For example, the year 1953 can be expressed as an integer (ordinal scale) or a decimal (interval scale). Furthermore, all values of linear time are unique. The year 1953 happened once and will never recur. Linear time is like a river. You start at some point and go with the flow. You can’t get back to your starting point, but it still exists somewhere in time.
Some time scales repeat. If day one is a Monday, then so is day eight. Likewise, month one is the same as month thirteen. So time can also be treated as being measured on a repeating ordinal scale. Durations don’t repeat; one day isn’t the same as eight days.
Does Anybody Really Know What Time It Is?
Most measurement scales are based on factors of ten. With time, though, there are 60 seconds per minute, 60 minutes per hour, and 24 hours per day. Blame the Babylonians for starting this craziness and every civilization for the next 4,000 years for being content with the status quo. In contrast, calendars have evolved from the Hellenic calendar (~850 BC), the Roman calendar (~750 BC), the Julian calendar (46 BC), to the Gregorian calendar (1582).
Everybody knows about seconds, minutes, hours, days, months, years, and even decades, centuries, and millennia, but there are many other units used for time. A jiffy is either one tick of a computer’s system clock (about 0.01 second) or the time required for light to travel one centimeter (about 33.3564 picoseconds). A New York second is the time between when a traffic signal turns from red to green and when the driver behind you honks his horn, about a second and a half. An inna minute is the time between when you ask a teenager to do something and the time he or she complies, usually about ten to thirty minutes. A warhol is being famous for fifteen minutes; a kilowarhol is being famous for approximately ten days. A moment is a medieval unit of time equal to about a minute and a half. A fortnight is two weeks. A platonic year is an astronomical unit measuring the time required for planets to align (about 26,000 calendar years).
There have been several systems in which time units were based on factors of ten, most notably by the Chinese (before the 17th century) and in France (during the 18th century). Decimal time divided a day (i.e., one rotation of the earth) into 10 metric hours, each hour into 100 metric minutes, and each minute into 100 metric seconds, sometimes termed a blink. A blink is 0.864 standard second, which is about twice the time it takes for you to blink your eye (from www.neatorama.com/2009/01/30/fun-and-unusual-units-of-measurements/)
Then there’s geologic time, which is subdivided into eon, eras, periods, epochs, and ages. The divisions are based on the rocks that were formed at the time and the fossils that occur within them. Consequently, the divisions aren’t all the same lengths and there aren’t the same number subdivisions in each division. For example, the Paleozoic era is twice as long as the Mesozoic era, and four times longer than the Cenozoic era (which admittedly is still in progress). Likewise, some periods are four times longer than others. Moreover, the lengths of the divisions can change as more is learned about the history of the Earth. The units of the scale are also different in different parts of the world. Geologic time is an ordinal scale devised because measurements of the interval scale on which it is based (i.e., years) lacks accuracy and precision.
Astronomical time is confusing, relatively, and it’s different if you’re on board the Enterprise or the Galactica. So the point is this—measuring time is complicated, not to mention time-consuming. But there’s even more to it than that.
Time Of The Season
Selecting an appropriate time scale is especially important because the scale can dictate the resolution and types of analyses that can be done. Resolution is an important matter. Select an interval that is too small and your database may become unmanageably large. Select an interval that is too large and you may not have enough resolution to investigate the time unit you are interested in. A good rule-of-thumb is to select an interval that is at least one time unit smaller than your unit of interest. For example, if you are interested in yearly trends, collect measurements every month. If you only collect measurements yearly, you won’t be able to assess the variability that occurs within a year. If you collect measurements more often than daily, you may have to rollup the data to make it manageable.
Take Your Time
Time formats can be difficult to deal with. Most data analysis software offer a dozen or more different formats for what you see. Behind the spreadsheet format, though, the database has a number, which is the distance the time is from an arbitrary starting point, in an arbitrary unit of time, almost always days. Convert a date-time format to a number format, and you’ll see what I mean. The software formatting allows you to recognize values as times while the numbers allow the software to calculate statistics. This quirk of time formatting also presents a potential for disaster if you use more than one piece of software, which use different starting points or time units. Always check that the formatted dates are the same between applications.
Time Will Tell
Time-series data are probably the most difficult type of data to analyze. Measurements involving time are usually autocorrelated, so using conventional statistical procedures can produce biased results. Besides their scale of measurement, there are several other aspects of temporal variables that add to the confusion.
- Ch-Ch-Ch-Ch-Changes—Time-series data can exhibit a variety of patterns, including step changes, linear and nonlinear trends, and cyclic fluctuations. The effects may be superimposed on each other within a given time period or spread over many different time periods. For example, a change in the discharge of a river may be attributable to abrupt and ephemeral causes such as failure of a dam or a sudden downpour (shocks), abrupt and long-term causes such as natural changes in a drainage way or a man-made diversion (step changes), long-term causes such as drought or changes in water consumption (trends), repetitive changes such as seasonal cycles related to rainfall or irrigation (cyclic fluctuations) as well as random variations. Confounded effects are often impossible to separate, especially if the data record is short or the sampled intervals are irregular or too large.
- One Day at a Time—Time-series measurements may not all be collected at a single instant in time. Some measurements are composites over time. For example, a flow measurement (e.g., stream, air) may be an instantaneous discharge or a total discharge over a selected time period. A sample may be collected at one time or be a composite of several samples collected at discrete time intervals and combined into a single sample container. The period over which each measurement is averaged is called the support. Obviously, you can’t evaluate a given time interval if your support is the same or larger than the interval.
- For the Times They Are a Changing—There is a dilemma involving time-series that are measured over many years. It goes like this. As knowledge and technology improve, the greater the chance that there will be improvements in sampling and analysis procedures that will reduce the overall variability of more recent measurements. That leads to violations of one of the fundamental assumption of parametric statistical procedures, equality of variances (also called homoscedasticity). Sometimes, you just can’t win.
- In the Year 2525 … —With most types of analysis, both statistical and deterministic, data analysts collect data over the entire range of the area of interest. If you want to analyze a chemical reaction at 100 degrees, you might analyze the reaction at temperatures between 80 degrees and 120 degrees. You wouldn’t, however, test the reaction at 40 to 80 degrees and extrapolate to what might happen at 100 degrees. In fact, scientists are taught never to extrapolate outside the range of their data. With time-series data, though, you have to extrapolate because you almost always want to know what will happen in the future. If you wait to see what actually happens, then it’s no longer interesting because it’s the past. And in the ultimate of ironies, you often can extrapolate time-series data because they are … autocorrelated. So the same property that makes time-series data difficult to analyze is what allows them to be extrapolated to future times, a process called forecasting. Mother Nature has a wicked sense of humor.
- Time Keeps on Slipping into the Future—With other types of data, even autocorrelated spatial data, you can verify predictions whenever the need arises. With predictions for a time-series, forecasts, you have to wait until the time in question arrives. Then you have just one chance. You can’t go back if something goes wrong and you miss collecting the verification data. Hence, you can’t control verification.
So those are a few points about how time is measured and analyzed. There’s much more to it than that, but I’ll save those thoughts for another time.
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.com, barnesandnoble.com, or other online booksellers.