Model building is like climbing a mountain. It’s what you spend so much time planning for. It’s what everybody wants to talk about. It’s what gives you that euphoric feeling of accomplishment when you’re finished. But just as mountain climbers have to descend, model builders have to deploy. You have to put your model in a form that will be palatable to users.

I had a client, a very skilled engineer, who wanted a model to predict how many workers he would need to hire during the year. His company produced three lines of products, most of which were customized for individual customers. A few years earlier, he had gone to great effort and expense to develop a model to predict his man-power needs. He collected data on how many of each type of product he had produced over the past five years and from that data had his managers estimate how long it took to make each product and complete the most common customizations. Then he had his sales force estimate the number of orders they expected the following year. He reasoned that adding up the time it took to produce a product multiplied by the number of expected orders would give him the number of man hours he would need. It was a classic bottom-up modeling approach.

The model had a problem, though. It didn’t work. Even after tinkering with the manufacturing times and correcting for employee leave, administrative functions, and inefficiency, the model still wasn’t very accurate. Moreover, it took his administrative assistant several weeks each year to collect the projected sales data to input into the model. Some of the sales force estimated more sales than they expected to try to impress the boss. Others estimated fewer sales so that they would have a better chance of making whatever goal might be given to them. A few avoided giving the administrative assistant any forecasts at all, so she just used numbers from the previous year.

Using a statistical modeling approach, I found that his historical staffing was highly correlated to just one factor, the number of units of one of the products he produced in the prior year. It made sense to me. His historical staffing levels were appropriate because he had hired staff as he needed them, albeit somewhat after his backlog reached a crisis. His business had also been growing at a fairly steady rate. So long as conditions in his market did not change, predicting future staffing needs was straightforward. He didn’t need to rely on projections from his psychologically fragile sales force.

But my model proved to be quite unsettling to many. The manager of the product line that was used as the basis of the model claimed the model proved his division merited a greater share of corporate resources, and bigger bonuses for him and his staff. Managers of the two product lines that were not included in the model claimed the model was too simplistic because it ignored their contributions.

At that point, the client had a complex model that he liked but didn’t work and a simple model that worked but nobody liked. He probably would have continued to use the complex model if it didn’t take so much work to gather the input data. Valid or not, the simple model had no credibility with his managers. He could calculate a forecast with the model but was reluctant to favor the model over the intuitions of the managers. So given his two flawed alternatives, the client decided to move manpower forecasting to the back burner until the next crisis would again bring it to a boil.

I wish I could say that this was an isolated case, but it’s more of a rule than an exception especially with technically oriented clients who are most comfortable working from the bottom details up to the prediction.

I once developed a model for a client to predict the relative risks associated with real estate they managed. The managers wanted a quick-and-dirty way to set priorities for conducting more thorough risk evaluations of the properties. I based my model on information that would be readily available to the client. They could evaluate a property for a few hundred dollars and decide in a day or two whether further evaluation was needed immediately or whether it could be deferred. When the model-development project was done, the model was turned over to the operations group for implementation. The first thing the operations manager did was invite “experts” he worked with to refine the model. Very quickly, the refinements became expansions. The model went from quick and dirty to comprehensive and protracted. It took the operations group on average $50,000 over six-months to evaluate each property. The priorities set by the refined model were virtually identical to the priorities set by the quick-and-dirty model.

Was one of these models good and the other bad? Not exactly, there’s an important distinction to be made. Statisticians, and for that matter, scientists and engineers and many other professionals, are taught that, all else being equal, simple is best. It’s Ockham’s razor. A simple model that predicts the same answers as a more complicated model should be considered to be better. It’s more efficient. But sometimes you, as the statistician, have to be more flexible.

The operations manager wasn’t comfortable with a simple model. He needed to be confident in the results, which, for him, required adding every theoretical possibility his experts could think of. He didn’t want to ignore any sources of risk, even if they were rare or unlikely. That made for a very inefficient model, but if you don’t have confidence in a model and don’t use it, it’s not the tool you need.

These cases illustrate how there’s more to modeling than just the technical details. There are also artistic and psychological aspects to be mastered. Textbooks describe statistical methods to find the best model components but not necessarily the ones that will work for the model’s users. Sometimes you have to be flexible. Think of Ockham razor as more of a spatula than a cleaver.

Like sausages, models need to look good on the outside especially if there are things on the inside that might make most users choke. You have to package the model. First, it can’t look so intimidating that users break out in a sweat when they see it. Leave the equations to the technical reviewers; hide them from the naive users. USDA inspectors have to look inside sausages, but you don’t. Second, put the model in a form that can be used easily. That inch-thick report may be great documentation, but it’ll garner more dust than users. If your users are familiar with Excel, program the model as a spreadsheet. If you know a computer language, put the model in a standalone application. A model is only as successful as the use to which it is put.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order **Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis**** **at Wheatmark, amazon.com, barnesandnoble.com, or other online booksellers.

Nice example (starting case). It recalled me about Gosplan which did a lot, but the economy became did better after Gosplan ruined 😉

About Ockham’s r spatula: I talk to students — your client pays you, take it into account. If s/he wishes something — just do it.

Pingback: Dealing with Dilemmas | Stats With Cats Blog

Pingback: Statistics: a Remedy for Football Withdrawal | Stats With Cats Blog

Pingback: Ten Ways Statistical Models Can Break Your Heart | Stats With Cats Blog

Pingback: Searching for Answers | Stats With Cats Blog