Understanding Data Modelling
11 May 2020
There has been a lot of discussion recently about data modelling, which is driving some of the decisions that Governments make in this pandemic. We talk to Dr James Morris, researcher and Lecturer in Engineering Mathematics who tells us more about modelling in general.
What is modelling and what is it being used for?
In a nutshell, modelling is a way of simplifying and quantifying an otherwise complex situation of interest. Whether it is used for optimized financing of renewable energy (my background) or for guiding public health decisions in pandemics like the one facing us today, the core concepts are the same.
Once a model captures the broad functional features it must be tuned with real world data; this process is often iterative, with new functional representations emerging to accommodate data that otherwise does not fit into an existing functional framework.
Statistical models face constant scrutiny because of their critical reliance on quality data, the methods of solution (e.g. for the differential equations that describe the viral spread and recovery), and sensitivity to error. For this reason, models are never relied upon in isolation, but are used in conjunction with expert opinion and historical context.
How would you explain modelling to someone from a different industry?
Imagine you’re in a submarine with limited visibility in murky water and you want to see further. You accept that this comes at a cost. Building a periscope seems like a good idea in the absence of any other source of information about what lies beyond your field of view.
You need to trust in the laws of optics to design the lenses, mechanics to build the hoisting mechanisms, and despite your best efforts the new images will often be cloudy and undecipherable. Might these fuzzy images even lead you further astray? That is always an inherent risk. But that’s where we remember never to use a single tool in isolation.
The human mind avails itself of all resources to paint the full picture; all is grist that comes to such a mill, each source of information casting more light on all the other sources.
Viewed in this way, modelling is part of a greater than zero sum game with indispensable roles to play, especially in consolidating and testing thinking in times of crisis, in weighing costs of various interventions, and in maximizing human welfare when multiple factors are at play.
There have been a number of controversial models cited in the press in recent weeks. Why do you think scientists cannot agree on one outcome / model?
Because models designed to inform public policy must consider many factors simultaneously, they will be particularly complex. For example, even the foremost model in viral spread would still need to be integrated with a model for overall impact, thus requiring a detailed understanding of demographic parameters and hospital capacities, as well as critical supply chains (e.g., for medical equipment) as a function of time.
Given the rarity of this event there will be very limited precedent, and so, limited consensus on the efficacy of any detailed integrated modelling adapted to the present day. As with all human pursuits, such a modelling endeavor will not be without bias, and this will unsurprisingly lead to a range in views on severity, and on the level (and social cost) of various interventions.
What is true for the models is doubly true for the data. As we’ve already seen, a number of different countries have differing views on the manner, timing, and completeness of their external reporting. Even fundamental modelling inputs, such as virus lethality as a function of age or underlying conditions are still in dispute. In the absence of accurate inputs, modellers are left with risk analysis and the inevitable and often poorly understood level of risk tolerance that must be assumed.
Is modeling in general a reliable way to predict an outcome?
Arguably the world coped well enough without modelling, especially in the time before rapid computation and many of the mathematical techniques we take for granted today.
The question is, can we do better? A model is nothing more than a quantitatively cohesive picture of the world, a repository of ideas and finder of consequences. Such concepts will always hold insight value – just as they already did before computers and just as they do now, in the real-time discussions that precede any programming or modelling per se.
We will always discuss pros and cons, balance evidence, refute causal circumstances and make our best efforts to forecast events – modelling simply formalizes this process and compels us to ask structured questions under a set of well documented assumptions and rules. In this way, modellers leave breadcrumbs to find their way out of tight spots, explore scenarios and parallel states of the world, determine inputs that drive sensitive outcomes, and as a consequence, inform better pathways for data acquisition.
Almost all models make assumptions about deterministic drivers of a system; in other words, they will inherently assume there is an intrinsic order to things, a chain of cause and effect. At its core, this is the dynamic engine that emerges from the differential equations, the repository of all the mechanistic assumptions made about the systems. If the system is inherently random or if it is highly sensitive to initial conditions, deterministic forecasts will be severely limited. In such systems, flapping butterfly wings can lead to hurricanes and chaos effects dominate over deterministic ones.
What could those modeling the pandemic learn from the way electrical engineers approach modeling?
There are indeed some core attributes to the way electrical engineers approach complex systems, many of which may provide unanticipated insights in the context of modelling pandemics. For example, an early course encountered by all of our engineers at UCL is Control Theory, taking them into the mathematical bowels of what constitutes stable and unstable systems and the extent to which human design can contribute to that stability. This kind of thinking can be seen as meta-paradigm of sorts insofar as it goes beyond modelling the system per se, and embraces instead the system and its controllable features as a unified whole. A common question in the mathematics of stability is this: if you have a system in an equilibrium state at some point in time, what is the behaviour of the system for states very near to the equilibrium – will they also persist for long periods of time? An electrical engineer avails oneself of a rich stock of time-tested tools and mathematical techniques for dealing with such systems – so questions like, “are we in a persistent state of low infection?”, for example, could stand to gain real insights from the control-orientated approach of an electrical engineer.
Originally with a background in atmosphere physics and quantitative finance, James has worked most of his career investing in and consulting for the renewable energy industry. He holds a PhD in chemical physics from Boston College in the US and an MBA in finance from the University of Oxford.