A decade or so ago, I always feared and was the frequent victim of hardware and software problems. It was a logical consequence of a craftsman routinely pushing his tools way beyond the limits of their capabilities. But the software is far better now, and the hardware is cheap enough to allow extraordinary redundancy. It isn’t often that a problem goes away so completely with so little fanfare. But that doesn’t say there are no technical problems that can cause major difficulties in a data analysis project. Here are three of the most common.
This problem seems to occur on every project in which the client is responsible for providing previously collected data. Data delivery might be late, incomplete, or in the wrong format. More times than I can count, clients have given me spreadsheets they used as a data table in a report—with footnotes, blank rows and columns, and all kinds of extraneous formatting—all the time thinking that the table was ready for statistical analysis. Those things happen, and in fact, should be anticipated and built into the project budget.
The real problem is when the client provides data sets after you’ve started your analysis. Statistical analysis projects are pretty much once-and-done endeavors. They might be repeated yearly or at some other provocation, they may use some of the same data and be done by the same analysts, but each analysis is expected to have at least some new data and new results, and most importantly, a new budget and schedule. This point is lost on many clients.
Updated data sets usually just include new data the client generated since they gave you the original data set. If the new data can be merged into your working data set, that’s less of a problem and more of an annoyance if you only have to scrub the new data (http://statswithcats.wordpress.com/2010/10/17/the-data-scrub-3/). Usually, you have to at least look at the original data in light of the new data. It’s a real problem, though, when the updated data set includes or excludes subjects the client thought they weren’t going to use, or involves a modified database query, or provides recalibrated measurements, or worst of all, corrects a few random errors they noticed. Too many times I’ve asked about possible errors in a database only to get a corrected and updated, totally new data file. It’s back to square one for Sisyphus the statistician. Whose fault is that?
Updating incorrect data is a two-edged sword. It will improve your analysis, sometimes substantially, but you lose any analysis work you’ve already done. If a client corrects one error, how can you be sure there won’t be more corrections? I had one client who agreed to deliver a complete, error-free data set that I would then analyze within sixteen weeks. They delivered a table, which I reformatted and scrubbed, and identified errors. They sent an updated table with corrected data. I reformatted and scrubbed the new data set, only to be told they had new data they wanted included. So we had a meeting to redefine what data would be included in the analysis, which they promised to deliver the following week. Two weeks later, more data arrived, but not all the data we agreed to. More data were delivered a few weeks later. This continued for weeks until the final pieces were delivered just three days before the original project deadline. You can probably guess what happened. The client was outraged when I hadn’t completed the sixteen week analysis in the three days I had the complete data set.
Perhaps the worst problem with inadequate data is when data errors aren’t noticed until after you’ve pretty much completed the analysis. Your dilemma is telling your client, in a nice way, that they screwed up and there are consequences. If the analysis is small and you want to keep the client, grit your teeth, get the correct data, redo the analysis, and take the loss. But if the analysis is more complex and you’ve passed the point of easy return, you have to explain to the client that they have two options—let you finish the analysis with the data you have or pay for you to redo the analysis. Changing only a couple of numbers might not change their decision based on the results but it will change all the numbers presented in the report. So, if the client plans to release the results to adversarial reviewers, they need to understand their alternatives.
Most of the time, your analysis will confirm what you and your client already suspect. No problem. Occasionally, you’ll reach some unexpected finding. Most clients don’t even mind this. They feel they got something new for their money. But there are two other kinds of findings that are problematical—complex and inconclusive results.
Exceedingly complex findings are difficult to communicate, especially to a non-technical audience. There’s only so much you can show in pie charts and bad graphs, even if you use cutesy icons of money bags and people. If the client doesn’t understand your findings, and especially, the value of your findings, your work will never see the light of day. Likewise if they don’t believe your results they’ll never be acted upon (http://statswithcats.wordpress.com/2011/01/16/ockham%E2%80%99s-spatula/). Even more troubling are inconclusive results. It’s difficult explaining to a client that you finished the work, spent all the money, but didn’t reach any conclusive findings. Imagine how you might feel if your mechanic were to tell you he couldn’t find or fix the problem with your car, but then charge you $500.
Unavailability of Key Staff
This happens on all projects not just statistical projects. Sometimes people get sick or resign and take new jobs. Sometimes, management reassigns your staff during lulls in the work, never to return to your project. There’s not much you can do to prevent these dilemmas. You just have to react quickly when the problem arises.
Faster, Better, Cheaper. Pick two. Get one.
Consultants always want to do a better job than their competitors, complete the job sooner, and charge less for their work. It never happens that way though. Some consultants always do superior work, but they may take longer to achieve their vision of perfection. Some consultants pride themselves in being the lowest cost, but their work is often mediocre. Other consultants specialize in quick response, no matter what it takes.
It’s like college. Most students have to do “academic triage”—pick the courses they will excel in and coast through the rest. Nobody is good at everything but that’s what clients want and expect. Besides, you probably said in your proposal that you were faster, better, and cheaper. Now it’s time to deliver.
So is it best to be faster? Should you try to be better? Is being cheaper what clients want most? Consider this analogy. Say you hire a painter to paint the outside of your house. You tell him what you want done and agree to a price and a schedule. Then something goes wrong. Maybe you have to leave town, or the painter can’t get the paint you want, or it rains for two weeks straight. Suddenly the whole agreement is in upheaval. Now, fast-forward a few years. Do you remember that the job took a month longer because of the rain or cost more because the paint had to be special ordered? Maybe, but chances are you don’t think about it nearly as often as you think about the appearance of the chipping, bubbling paint caused by the poor application.
In general, the memory of poor quality lasts far longer than memories of missed schedules or overrun budgets. Quality, however, is a matter of opinion. It’s easy to tell when budgets and schedules are missed. So you have to try to balance all three. But if you find that you can’t be faster, better, and cheaper, you’ll have to do “management triage.” If there’s no money left in the budget, you may have to put in some free time even if it results in a delay. If you have an immovable deadline, get help even if you have to eat some costs. If you have no budget or schedule flexibility, stop where you are and package the deliverable with recommendations for the work you wanted to do but couldn’t finish. Maybe you’ll get lucky.
Picking between faster, better, and cheaper is both a technical and a business decision that is never pleasant. If you decide not to pick quality, beware of the long-term consequences. Whatever you decide to do, don’t wait to inform the client. Clients hate surprises. Confirmed bad news delivered late in a project is much worse than potential bad news delivered early in the project.
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.com, barnesandnoble.com, or other online booksellers.