This article by the BBC’s Rachel Schraer explores the modelling for the progression of the Coronavirus Covid-19. In the article we see some graphs showing epidemic growth rates, and in particular this one showing infection rate dependency on how many one individual infects in a given period.
This chart led me to look into more sophisticated modelling tools than just the spreadsheet I already mentioned in my previous article on Coronavirus modelling; this is a very specialist area, and I’m working hard to model it more fully.
My spreadsheet model was a simple power law model; it allows you to enter a couple of your own parameters (the number of days out, and the doubling period for cases in days) to see the outcome; see it at:
It lists, as a table, case outcomes after a given number of days (up to 30 – but you can enter your own forecast number of days, and doubling period) since 100 cases, given how many days it is assumed it takes for cases to double. It’s just a simple application of a power law, and is only an analysis of output rate numbers, not a full model. It explains potential growth, on various doubling assumptions. It appears, for example, in the following Johns Hopkins chart (this time for deaths, but it’s a similar model for cases), which presents the UK and Italy prognoses lying between doubling every two days and three days since the index day at 10 deaths:
Any predicted outcomes are VERY dependent on that doubling rate assumption, as my spreadsheet showed – in terms of cases, after 30 days since 100 cases, a doubling every 2 days would lead to about 3 million cases, but a doubling every 3 days leads to 100 thousand cases. This is an example of the non-linearity of the modelling – a 50% improvement in the case doubling period leads to a 30-fold improvement in prediction for case numbers after 100 days.
To reproduce the infection rate growth numbers in the BBC article above, relating the resultant number of cases after 30 days (say) to the average number of people an individual infects (the so-called R0 number, the Basic Reproduction Number) requires a deeper modelling technique. For an R0 explanation, see https://en.m.wikipedia.org/wiki/Basic_reproduction_number
I was interested, seeing the BBC infection rate chart, and its implications, to understand how precisely the number of people an individual is assumed to infect (on average) is related to the “doubling” rate assumptions we can make in the spreadsheet analysis.
I’ve been looking at SIR modelling – Susceptibility-Infected-Recovered modelling – in a simple form, to get the idea of how it works. There are quite a few references to the topic, going back a long way. A very useful paper I have been consulting is from Stanford University in 2007 (https://web.stanford.edu/~jhj1/teachingdocs/Jones-on-R0.pdf), and some of the basis for the shape of that basic modelling goes back to Kermack and McKendrick in 1927).
Usefully I have found some Python code in the Gillespie reference below that codifies a basic model, using a solution technique to the basic equations (which although somewhat simple first-order differential equations, are non-linear and therefore difficult to solve analytically) employing this Gillespie algorithm, which derives from work done in 1976, and is basically a Monte-Carlo probabilistic iterative time-stepping model well suited to computers (of the type I used to play with (for a quite different purpose) for the MoD in the 1970s).
My trial model (to become familiar with the way that the model behaves) is based on the Python code, and I found that with the small total population (N) of 350, with generic parameters for infection rate (α) and recovery rate (β), there is slow growth in cases for a long time (relatively) and then a sharp increase (at about t=0.1 time units in the chart below) leading to a peak at about t=0.3, when recovery starts to happen; the population returns to health at about t=10. The very slow initial growth (from ONE index case) is why I show the x-axis with a log scale.
This very slow growth from ONE case is, I guess, why most charts begin with the first 100 cases (or, in the case of deaths, 10) so that the chart saves horizontal axis space by suppressing the long lead-in period.
My next task is to put some real numbers into a model like this, and to work it though for a LARGE population, and for comparison, to run it from time zero at 100 cases (which might avoid the long lead time in this current generic model).
I expect to find that I could then use a linear x-axis time scale, but that I would have to present the chart with a log y-scale for cases, as the model would need to represent the exponential growth we have seen for Coronavirus.
More sophisticated models also include birth-death adjustments (a demographic model) in the work, but as the life-cycle being assessed for the Covid-19 virus is much shorter (hopefully!) than the demographic cycle, this is ignored to start with.
Another parameter that might be included for some important infections, where there is a significant incubation period during which individuals have been infected but are not yet infectious themselves, is the “Exposed” parameter. During this period the individual is in a compartment E (for Exposed), prior to entering compartment I (for Infected), turning the SIR model into a SEIR model.
Another version of the model might take into consideration the exposed or latent period of the disease, and where an infection does not leave any immunity after recovery, so that individuals that have recovered return to being susceptible again, moving them back into the S(t) (Susceptible as a function of time) compartment of the model; this model is therefore called the SEIS model. For a description of these models, and more, see https://en.m.wikipedia.org/wiki/Compartmental_models_in_epidemiology
So we see that this is a far more complicated issue than at first sight. It is why, I think, Sir Patrick Vallance, the Chief Scientific Adviser, today began to talk about the R0 figure (a dimensionless number (a ratio)) relating to the average number of people that one individual might infect.
My feeling was that we are far from a value for R0 that would lead to the end of the epidemic being in sight, since, if we in the UK are tracking a doubling of cases every 3 days (as we have been), then this might be nearer to an R0 of 2.5, rather than anywhere near ONE. If R0 drops below 1, then the epidemic would eventually die out, which he mentioned. Above 1 and it continues to grow. As I said, I think we are far from an R0<1 situation.
The amount by which R0 exceeds the value 1 might not seem to have such a great effect on the numbers of cases we are seeing, at these early stages of an epidemic, as a but as the days wear on, the effects are VERY (i.e. exponentially) noticeable, and this is why the charts often have y-axis scales that are logarithmic, because otherwise they couldn’t easily be displayed.
In a linear y-axis chart, we run out of y-axis space quite quickly for exponential functions; to see all the data, at the later time values, we have to compress the chart vertically so much that it is then hard to see the earlier, lower numbers; we see this in such a chart below, that has a lot of “growing” to do. Note the dotted line which is the predicted line for doubling of cases every 3 days (which we in the UK have been tracking):
It has therefore become more usual to present the data differently, with a log scale for the y-axis, where, for example, the sample dotted “doubling” lines are straight lines, not steeply growing exponential curves (in the chart below, two dotted guidelines are shown for deaths, one for 2 day doubling, and one for 3 days); the shorter the doubling period, the steeper the straight line on such a chart:
In the 30th March Vallance presentation on TV, the growth curves on the last couple of log charts shown (cases and deaths, respectively) had a SLOPE that was DEcreasing slowly, not INcreasing (exponentially) rapidly (as the raw numbers actually are) although for a mathematician (or a data visualiser) this is a valid way to present such data.
The visual effect of choosing a such a log scale for the y-axis would have been explained in more detail in an academic lecture theatre (as I have tried to do here), and I think it is useful to point this out, and would be a clarification in the Government updates.
A final point, made in both the 29th and 30th March daily TV presentations, is that actions taken today will not have a tangible effect until a few days (maybe a week to two weeks) later; the outputs lag the inputs because of the lead times involved in infection rates, and in the effect of counter-measures on their reduction. What we see tomorrow doesn’t relate to today’s actions, it depends more on actions taken a week or more ago.
From recent charts, shown in Government updates, it does seem that what was done a week ago (self-isolation, social distancing and reduction in opportunities for people to meet other than in their household units) is beginning to have a visible effect on travel patterns, but any moves in the infection charts, if at all, are rather small so far.