Whether you know it or not, you deal with models every day. Your weather forecast comes from a meteorological model, usually several. Mannequins are used to display how fashions may look on you. Blueprints are drawn models of objects or structures to be built. Maps are models of the earth’s terrain. Examples are everywhere.
Models are representations of things, usually an ideal, a standard, or something desired. They can be true representations, approximate (or at least as good as practicable), or simplified, even cartoonish compared to what they represent. They can be about the same size, bigger, or most typically, smaller, whatever makes them easiest to manipulate. They can represent:
- Physical objects that can be seen and touched
- Processes that can be watched
- Behaviors that can be observed
- Conditions that can be monitored
- Opinions that can be surveyed.
The models themselves do not have to be physical objects. They can be written, drawn, or consist of mathematical equations or computer programming. In fact, using equations and computer code can be much more flexible and less expensive than building a physical model.
Classification of Models
There are many ways that models are classified, so this catalog isn’t unique. The models may be described with different terms or broken out to greater levels of detail. Furthermore, you can also create hybrid models. Examples include mash-ups of analytical and stochastic components used to analyze phenomena such as climate change and subatomic particle physics. Nevertheless, the catalog should give you some ideas for where you might start to develop your own model.
Your first exposure to a model was probably a physical model like a baby pacifier or a plush animal, and later, a doll or a toy car. From then, you’ve seen many more – from ant farms to anatomical models in school. You probably even built your own models with Legos, plastic model kits, or even a Halloween costume. They are all representations of something else.
Physical models aren’t used often for advanced applications because they are difficult and expensive to build and calibrate to a realistic experience. Flight simulators, hydrographic models of river systems, and reef aquariums are well known examples.
Models can also be expressed in words and pictures. These are used in virtually all fields to convey mental images of some mechanism, process, or other phenomenon that was or will be created. Blueprints, flow diagrams, geologic fence diagrams, anatomical diagrams are all conceptual models. So are the textual descriptions that go with them. In fact, you should always start with a simple text model before you embark on building a complex physical or mathematical model.
Mathematical and Computer Models
Theoretical models are based on scientific laws and mathematical derivations. Both theoretical models and deterministic empirical models provide solutions that presume that there is no uncertainty. These solutions are termed exact (which does not necessarily imply correct). There is a single solution for given inputs.
Analytical models are mathematical equations derived from scientific laws that produce exact solutions that apply everywhere. For example, F (force) = M (mass) times A (acceleration) and E(energy) = m (mass) times c2 (speed of light squared) are analytical models. Probably, most concepts in classical physics can be modeled analytically.
Numerical models are mathematical equations that have a time parameter. Numerical models are solved repeatedly, usually on a grid, to obtain solutions over time. This is sometimes called a Dynamic Model (as opposed to a Static Model) because it describes time-varying relationships.
Empirical models can be deterministic, probabilistic, stochastic, or sometimes, a hybrid of the three. They are developed for specific situations from measured data. Empirical models differ from theoretical models in that the model is not necessarily fixed for all instances of its use. There may be multiple reasonable empirical models that can apply to a given situation.
Deterministic empirical models presume that a mathematical relationship exists between two or more measurable phenomena (as do theoretical models) that will allow the phenomena to be modeled without uncertainty (or at least, not much uncertainty, so that it can be ignored) under a given set of conditions. The difference is that the relationship isn’t unique or proven. There are usually assumptions. Biological growth and groundwater flow models are examples of deterministic empirical models
Probability models are based on a set of events or conditions all occurring at once. In probability, it is called an intersection of events. Probability models are multiplicative because that is how intersection probabilities are combined. The most famous example of a probability model is the Drake equation, a summary of the factors affecting the likelihood that we might detect radio-communication from intelligent extraterrestrial life
Stochastic empirical models presume that changes in a phenomenon have a random component. The random component allows stochastic empirical models to provide solutions that incorporate uncertainty into the analysis. Stochastic models include lottery picks, weather, and many problems in the behavioral, economic, and business disciplines that are analyzed with statistical models.
In statistical comparison models, the dependent variable is a grouping-scale variable (one measured on a nominal scale). The independent variable can be either grouping, continuous, or both. Simple hypothesis tests include:
- c2 tests that analyze cell frequencies on one or more grouping variables, and
- t-tests and z-tests that analyze independent variable means in two or fewer groups of a grouping variable.
Analysis of Variance (ANOVA) models compare independent variable means for two or more groups of a dependent grouping variable. Analysis of Covariance (ANCOVA) models compare independent variable means for two or more groups of a dependent grouping variable while controlling for one or more continuous variables. Multivariate ANOVA and ANCOVA compare two or more dependent variables using multiple independent variables. There are many more types of ANOVA model designs.
Classification and identification models also analyze groups.
Clustering models identify groups of similar cases based on continuous-scale variables. There need be no prior knowledge or expectation about the nature of the groups. There are several types of cluster analysis, including hierarchical clustering, K-Means clustering, two-step clustering, and block clustering. Often, the clusters or segments that are used as inputs to subsequent analyses. Clustering models are also known as segmentation models.
Clustering models do not have a nominal-scale dependent variable, but most classification models do. Discriminant analysis models have a nominal-scale dependent variable and one or more continuous-scale independent variables. They are usually used to explain why the groups are different, based on the independent variables, so they often follow a cluster analysis. Logistic regression is analogous to linear regression but is based on a non-linear model and a binary or ordinal dependent variable instead of a continuous-scale variable. Often, models for calculating probabilities use a binary (0 or 1) dependent variable with logistic regression.
There are many analyses that produce decision trees, which look a bit like organization charts. C&R (Classification and Regression Trees) split categorical dependent variables into its groups based in continuous or categorical-scale independent variables. All splits are binary. CHAID (Chi-square Automatic Interaction Detector) generates decision trees that can have more than two branches at a split. A Random Forest consists of a collection of simple tree predictors.
Explanation models aim to explain associations within or between sets of variables. With explanation models, you select enough variables to address all the theoretical aspects of the phenomenon, even to the point of having some redundancy. As you build the model, you discover which variables are extraneous and can be eliminated.
Factor Analysis (FA) and Principal Components Analysis (PCA) are used to explore associations in a set of variables where there is no distinction between dependent and independent variables. The two types of statistical analysis:
- Create new metrics, called factors or components, which explain almost the same amount of variation as the original variables.
- Create fewer factors/components than the original variables so further analysis is simplified.
- Require that the new factors/components be interpreted in terms of the original variables, but they often make more conceptual sense so subsequent analyses are more intuitive.
- Produce factors/components that are statistically independent (uncorrelated) so they can be used in regression models to determine how important each is in explaining a dependent variable.
Canonical Correlation Analysis (CCA) is like PCA only there are two sets of variables. Pairs of components, one from each group, are created that explain independent aspects of the dataset.
Regression analysis is also used to build explanation models. In particular, regression using principle components as independent variables is popular because the components are uncorrelated and not subject to multicollinearity.
Some models are created to predict new values of a dependent variable or forecast future values of a time-dependent variable. To be useful, a prediction model must use prediction variables that cost less to generate than the prediction is worth. So the predictor variables and their scales must be relatively inexpensive and easy to create or obtain. In prediction models, accuracy tends to come easy while precision is elusive. Prediction models usually keep only the variables that work best in making a prediction, and they may not necessarily make a lot of conceptual sense.
Regression is the most commonly used technique for creating prediction models. Transformations are used frequently. If a model includes one or more lagged values of the dependent variable among its predictors, it is called an autoregressive model.
Neural Networks is a predictive modeling technique inspired by the way biological nervous systems process information. The technique involves interconnected nodes or layers that apply predictor variables in different ways, linear and nonlinear, to all or some of the dependent variable values. Unlike most modeling techniques, neural networks can’t be articulated so they are not useful for explanation purposes.
Picking the Right Model
There are many ways to model a phenomenon. Experience helps you to judge which model might be most appropriate for the situation. If you need some guidance, follow these steps.
- Step 1 – Start at top of the Catalog of Models figure. Decide whether you want to create a physical, mathematical, or conceptual model. Whichever you choose, start by creating a brief conceptual model so you have a mental picture of what your ultimate goal is and can plan for how to get there.
If your goal is a physical or full blown conceptual model, do the research you’ll need to identify appropriate materials and formats. But this blog is about mathematical models, so let’s start there
- Step 2 – If you want to select a type of mathematical model, start on the second line of the Catalog of Models figure and decide whether your phenomenon fits best with a theoretical or an empirical approach.
If there are scientific or mathematical laws that apply to your phenomenon, you’ll probably want to start with some type of theoretical model. If there is a component of time, particularly changes over time periods, you’ll probably want to try developing a numerical model. Otherwise, if a single solution is appropriate, try an analytical model.
- Step 3 – If your phenomenon is more likely to require data collection and analysis to model, you’ll need an empirical model. An empirical model can be probabilistic, deterministic, or stochastic. Probability models are great tools for thought experiments. There are no wrong answers, only incomplete ones. Deterministic models are more of a challenge. There needs to be some foundation of science (natural, physical, environmental, behavioral, or other discipline), engineering, business rules, or other guidelines for what should go into the model. More often than not, deterministic models are overly complicated because there is no way to distinguish between components that are major factors versus those that are relatively inconsequential to the overall results. Both Probability and Deterministic models are often developed through panels of experts using some form of Delphi process.
- Step 4 – If you need to develop a stochastic (statistical) model, go here to pick the right tool for the job.
- Step 5 – Consider adding hybrid elements. Don’t feel constrained to only one type of component in building your model. For instance, maybe your statistical model would benefit from having deterministic, probability, or other types of terms in it. Calibrate your deterministic model using regression or another statistical method. Be creative.
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data analysis at amazon.com, barnesandnoble.com, or other online booksellers.