In the recent daily UK Government presentations, the R0 Reproductive Number has been mentioned a few times, and with good reason. Its value is as a commonly accepted measure of the propensity of an infectious disease outbreak to become an epidemic.
It turns out to be a relatively simple number to define, although working back from current data to calculate it is awkward if you don’t have the data. That’s my next task, from public data.
If R0 is below 1, then the epidemic will reduce and disappear. If it is greater than 1, then the epidemic grows.
The UK Government and its health advisers have made a few statements about it, and I covered these in an earlier post.
This is a more technical post, just to present a derivation of R0, and its consequences. I have used a few research papers to help with this, but the focus, brevity(!) and mistakes are mine (although I have corrected some in the sources).
This is an extract from the Government daily Coronavirus briefing on 2nd April 2020, led by Matt Hancock, with Professor Stephen Powis, Medical Director of the NHS in England.
In this clip, Prof. Powis states that he thinks there is “early academic evidence” that the R0 “Reproduction Number”, (what he calls the transmission rate) reflecting the average number of people infected by one person, “may have fallen below 1”. This is an important target, because when R0 for a given epidemic is below 1, the epidemic will die out.
My issue here is that I would like to see the evidence.
In Matt Hancock’s statement, he said that “the best scientific analysis is that” the “doubling” period for the rate of infection of the virus (another important statistic, used by Johns Hopkins University in many of their charts for the epidemic) was between three and four days.
In the early stages of a rapidly growing epidemic, either of three or four days lead to quite different outcomes (for power series (or exponential) growth) after, say, 30 days (since day zero, defined as 100 cases), for the total number of people infected – 3 days leads to 100,000 cases, and 4 days to 18,000.
See my previously posted spreadsheet to run your own numbers for this:
My opinion is that it’s an optimistic range, but in any case it is too wide to be meaningful without more detail. I suspect 2.5 days (which would lead to 400,000 cases) might be a better number, but without seeing the scientific data – the “evidence” in either case – who knows?
But do you see how sensitive outcomes are to quite small changes in estimates to the input data at this early stage in the epidemic?
Both of these video statements sound to me to be too optimistic, and smack of concern to reassure the public rather than undiluted science.
My mention of “exponential” above reminds me to repeat my usual issue with presentations of log scale graphs without explanation.
The visual appearance of this chart from the 2nd April update, to the average member of the public, doesn’t really highlight visually that the growth in numbers is exponential – it looks more as if the rates are flattening. It requires scrutiny of the y-axis to see that the top gridline in the chart is representing a number (10,000) 200 times higher than the baseline, 50. It’s at least careless not to mention that, other than “Logarithmic scale” (not saying whether x- or y-axis) in a tiny font at the bottom of the chart. A similar chart was shown on 3rd April too.
I’m guessing that the chart above was drawn straight from either the Worldometer, or Johns Hopkins University site without due consideration of its meaning for non-practitioners.
They do know the difference, of course; this DHSC chart, with linear axes, was also presented, and it does show the characteristic exponential growth of the epidemic, in this case for the deaths from the outbreak.
In the news today (coincident with my current work on this post) we can see how important the measures taken through social distancing, self-isolation and (partial) lockdown are to reducing the rate of infections. So far, the SHTM researchers say “…the R0 could be cut…” and “likely lead to a substantial impact and decline…” – i.e. we are on track theoretically, but we are yet to see the full benefits. It is still my view, expressed in my last post, that in the UK, we are still at an R0 figure of 2.5-3.5, with a doubling period of 2-3 days (2 and 3 are VERY different figures in this exponential context, as we see from my earlier posts).
While it’s a small study, it’s interesting to note the intention to reduce the R0 Reproduction Number (that I talked about in my last post), equivalent to the average number of people a person infects in the “susceptible” part of the population (those not yet infected, or recovered). The SIR model I have been working on reflects this concept in generic terms, as described below.
The model is, in effect, a Markov Chain solution to the infection activity, where each state of the SIR (Susceptible-Infected-Recovered compartmentalised) population at time t+δt depends only on the previous state at time t, solved through Monte-Carlo sampling for the statistical behaviour (ie not analytically), using an iterative procedure called the Gillespie algorithm, described last time, to solve the differential equations of the SIR model.
Thanks to some great work by Tomasz Knapik (Sutton family friend) I can show show a couple of new charts relating to variations in the data for the Gillespie algorithm for solving the SIR equations I talked about in my previous post.
You’ll recall this chart from last time, where for a population of 350, with infection rate α, and recovery rate β set, the progress of this simple SIR model shows the peak in infections from 1 index case, with a corresponding reduction in susceptible individuals, and an increase in recoveries until the model terminates with all individuals recovered, and no susceptible individuals left.
I drew this with a log scale (base 10) for time, because in this model, with the default input data, not much happens for quite a while from time t=0, until the infections take off rapidly at t=0.1 (on the generic time scale) and reach a peak at t=0.3. The model is fully played out at t=10. The peak for infections, out of a population of 350, is 325, for this input data:
Key parameters for the basic SIR model are:
The total population, N; the rate of infection α; and the rate of cure β.
Base default data for the simulation is for 1 index infected case at t=0, and the “Spatial Parameter” V, set at 100 in the generic units used by the model. I’ll talk more about this parameter in a later post.
A larger V reduces the infection growth through “distancing” the infected cases from the susceptible compartment of the population, and a smaller V increases the pace of infection by reducing the distance in general terms.
The equations for the simple SIR model being analysed, for a limited total population N, where:
N = nS + nI + nR , the sum of the compartmentalised Susceptible, Infected and Recovered numbers at any time t;
With linear scales for both the x-axis (time) and the y-axis (cases) we can see, in the following two charts, that a change in the spatial parameter (V) in the differential equations above does precisely what the Government is trying to achieve with its measures regarding self-isolation, social distancing and reduction in opportunities for people to meet other than in their household units.
Firstly, in the the modified Python Gillespie model (the Knapik, Sutton & Sutton model!) for the case illustrated above, but with linear axes:
Second, I change the spatial parameter, V, to 1000, for illustrative purposes:
We see above that the spatial parameter, representing increasing “social distancing” in this SIR application, does what is intended by the social distancing behaviour encouraged by most Governments – it pushes the peak of the infections from t=0.3 further out, towards t=2, and also flattens the peak, down to about 140 cases, from 325 before.
My task now is to relate the following key parameters of this situation:
First of all, the Reproduction Number R0 (the number of people one person might infect).
Secondly, the generation interval (how long it takes that infection to take effect (a vital number, and the subject of some difference in the literature between medical practitioners on the one hand, and demographic / biological practitioners on the other). An epidemiological formula, and a demographic/ecological/evolutionary biology formula can both be used to assess the generation interval, from R0, but can lead to very different results, to summarise a point made in paper on exactly this topic (ref. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1766383/).
Third, the doubling period – what is the right logarithmic multiplier in the Johns Hopkins University charts, given R0?
And lastly, the time taken, relatively, to see the R0 come down to a value near 1, when the epidemic stabilises and would tend to die out if R0<1 can be achieved.
No account is taken by my SIR model of recovery and re-infection; the SEIS model mentioned last time, or a variation of it, would be needed for that.
No-one is clear, furthermore, as to the seasonality of this Coronavirus, Covid-19, with regard to winter and summer, ambient humidity and temperatures, or the likelihood of mutation.
This article by the BBC’s Rachel Schraer explores the modelling for the progression of the Coronavirus Covid-19. In the article we see some graphs showing epidemic growth rates, and in particular this one showing infection rate dependency on how many one individual infects in a given period.
This chart led me to look into more sophisticated modelling tools than just the spreadsheet I already mentioned in my previous article on Coronavirus modelling; this is a very specialist area, and I’m working hard to model it more fully.
My spreadsheet model was a simple power law model; it allows you to enter a couple of your own parameters (the number of days out, and the doubling period for cases in days) to see the outcome; see it at:
It lists, as a table, case outcomes after a given number of days (up to 30 – but you can enter your own forecast number of days, and doubling period) since 100 cases, given how many days it is assumed it takes for cases to double. It’s just a simple application of a power law, and is only an analysis of output rate numbers, not a full model. It explains potential growth, on various doubling assumptions. It appears, for example, in the following Johns Hopkins chart (this time for deaths, but it’s a similar model for cases), which presents the UK and Italy prognoses lying between doubling every two days and three days since the index day at 10 deaths:
Any predicted outcomes are VERY dependent on that doubling rate assumption, as my spreadsheet showed – in terms of cases, after 30 days since 100 cases, a doubling every 2 days would lead to about 3 million cases, but a doubling every 3 days leads to 100 thousand cases. This is an example of the non-linearity of the modelling – a 50% improvement in the case doubling period leads to a 30-fold improvement in prediction for case numbers after 100 days.
To reproduce the infection rate growth numbers in the BBC article above, relating the resultant number of cases after 30 days (say) to the average number of people an individual infects (the so-called R0 number, the Basic Reproduction Number) requires a deeper modelling technique. For an R0 explanation, see https://en.m.wikipedia.org/wiki/Basic_reproduction_number
I was interested, seeing the BBC infection rate chart, and its implications, to understand how precisely the number of people an individual is assumed to infect (on average) is related to the “doubling” rate assumptions we can make in the spreadsheet analysis.
I’ve been looking at SIR modelling – Susceptibility-Infected-Recovered modelling – in a simple form, to get the idea of how it works. There are quite a few references to the topic, going back a long way. A very useful paper I have been consulting is from Stanford University in 2007 (https://web.stanford.edu/~jhj1/teachingdocs/Jones-on-R0.pdf), and some of the basis for the shape of that basic modelling goes back to Kermack and McKendrick in 1927).
Usefully I have found some Python code in the Gillespie reference below that codifies a basic model, using a solution technique to the basic equations (which although somewhat simple first-order differential equations, are non-linear and therefore difficult to solve analytically) employing this Gillespie algorithm, which derives from work done in 1976, and is basically a Monte-Carlo probabilistic iterative time-stepping model well suited to computers (of the type I used to play with (for a quite different purpose) for the MoD in the 1970s).
My trial model (to become familiar with the way that the model behaves) is based on the Python code, and I found that with the small total population (N) of 350, with generic parameters for infection rate (α) and recovery rate (β), there is slow growth in cases for a long time (relatively) and then a sharp increase (at about t=0.1 time units in the chart below) leading to a peak at about t=0.3, when recovery starts to happen; the population returns to health at about t=10. The very slow initial growth (from ONE index case) is why I show the x-axis with a log scale.
This very slow growth from ONE case is, I guess, why most charts begin with the first 100 cases (or, in the case of deaths, 10) so that the chart saves horizontal axis space by suppressing the long lead-in period.
My next task is to put some real numbers into a model like this, and to work it though for a LARGE population, and for comparison, to run it from time zero at 100 cases (which might avoid the long lead time in this current generic model).
I expect to find that I could then use a linear x-axis time scale, but that I would have to present the chart with a log y-scale for cases, as the model would need to represent the exponential growth we have seen for Coronavirus.
More sophisticated models also include birth-death adjustments (a demographic model) in the work, but as the life-cycle being assessed for the Covid-19 virus is much shorter (hopefully!) than the demographic cycle, this is ignored to start with.
Another parameter that might be included for some important infections, where there is a significant incubation period during which individuals have been infected but are not yet infectious themselves, is the “Exposed” parameter. During this period the individual is in a compartment E (for Exposed), prior to entering compartment I (for Infected), turning the SIR model into a SEIR model.
Another version of the model might take into consideration the exposed or latent period of the disease, and where an infection does not leave any immunity after recovery, so that individuals that have recovered return to being susceptible again, moving them back into the S(t) (Susceptible as a function of time) compartment of the model; this model is therefore called the SEIS model. For a description of these models, and more, see https://en.m.wikipedia.org/wiki/Compartmental_models_in_epidemiology
So we see that this is a far more complicated issue than at first sight. It is why, I think, Sir Patrick Vallance, the Chief Scientific Adviser, today began to talk about the R0 figure (a dimensionless number (a ratio)) relating to the average number of people that one individual might infect.
My feeling was that we are far from a value for R0 that would lead to the end of the epidemic being in sight, since, if we in the UK are tracking a doubling of cases every 3 days (as we have been), then this might be nearer to an R0 of 2.5, rather than anywhere near ONE. If R0 drops below 1, then the epidemic would eventually die out, which he mentioned. Above 1 and it continues to grow. As I said, I think we are far from an R0<1 situation.
The amount by which R0 exceeds the value 1 might not seem to have such a great effect on the numbers of cases we are seeing, at these early stages of an epidemic, as a but as the days wear on, the effects are VERY (i.e. exponentially) noticeable, and this is why the charts often have y-axis scales that are logarithmic, because otherwise they couldn’t easily be displayed.
In a linear y-axis chart, we run out of y-axis space quite quickly for exponential functions; to see all the data, at the later time values, we have to compress the chart vertically so much that it is then hard to see the earlier, lower numbers; we see this in such a chart below, that has a lot of “growing” to do. Note the dotted line which is the predicted line for doubling of cases every 3 days (which we in the UK have been tracking):
It has therefore become more usual to present the data differently, with a log scale for the y-axis, where, for example, the sample dotted “doubling” lines are straight lines, not steeply growing exponential curves (in the chart below, two dotted guidelines are shown for deaths, one for 2 day doubling, and one for 3 days); the shorter the doubling period, the steeper the straight line on such a chart:
In the 30th March Vallance presentation on TV, the growth curves on the last couple of log charts shown (cases and deaths, respectively) had a SLOPE that was DEcreasing slowly, not INcreasing (exponentially) rapidly (as the raw numbers actually are) although for a mathematician (or a data visualiser) this is a valid way to present such data.
The visual effect of choosing a such a log scale for the y-axis would have been explained in more detail in an academic lecture theatre (as I have tried to do here), and I think it is useful to point this out, and would be a clarification in the Government updates.
A final point, made in both the 29th and 30th March daily TV presentations, is that actions taken today will not have a tangible effect until a few days (maybe a week to two weeks) later; the outputs lag the inputs because of the lead times involved in infection rates, and in the effect of counter-measures on their reduction. What we see tomorrow doesn’t relate to today’s actions, it depends more on actions taken a week or more ago.
From recent charts, shown in Government updates, it does seem that what was done a week ago (self-isolation, social distancing and reduction in opportunities for people to meet other than in their household units) is beginning to have a visible effect on travel patterns, but any moves in the infection charts, if at all, are rather small so far.
If we had experienced (through the social distancing and other precautionary measures) and continue to experience a doubling period of 5 days (not on the chart but a possible input to my spreadsheet), it would lead to 25,000 cases after 40 days.
If we had managed to experience 7 days for doubling of cases (as Japan and Singapore seem to have done), then we would have seen 5000 cases at 40 days (but that’s where we are already, so too late for that outcome).
So the outcomes are VERY sensitively dependent on the doubling period, which in turn is VERY dependent on the average number of people each carrier infects.
I haven’t modelled that part yet, but, again, assumptions apart, the doubling period would be an outcome of that number, together with how long cases last (before death or recovery) and whether re-infection is possible, likely or frequent. It all gets a bit more difficult to be predictive, rather than mathematically expressing known data.
On a more positive note, there is a report today of the statistical work of Michael Levitt (a proper data scientist!), who predicted on February 21st, with uncanny accuracy, the March 23rd situation in China (improvements compared with the then gloomy other forecasts). See the article attached.
A few people might have see the Johns Hopkins University Medical School chart on Covid-19 infection rates in different countries. This particular chart (they have produced many different outputs, some of them interactive world incidence models – see https://coronavirus.jhu.edu/map.html for more) usefully compares some various national growth rates with straight lines representing different periods over which the number of cases might double – 1 day, 2 days, 3 days and 7 days. It’s a kind of log chart to base 2.
I’ve been beginning to simulate the outcomes for 2 input data items:
your chosen number of days (x) since the outbreak (defined at 100 cases on day zero to give a base of calculation);
your chosen rate of growth of cases, expressed by an assumed number of days for doubling the cases number (z), and then;
the output, the number of cases (y) on day x.
This spreadsheet allows you , in the last columns, to enter x and z in order to see the outcome, y.
Of course this is only an output model, it knows nothing about the veracity of assumptions – but the numbers (y) get VERY large for small doubling periods (z).
Here’s the kind of stuff that the Covid-19 modellers will be doing. https://www.nature.com/articles/srep46076.pdf I have downloaded GleamViz, http://www.gleamviz.org/simulator/client/, and it is quite complicated to set up (I used to have a little Windows app called Wildfire that just needed a few numbers to get a pictorial progression of life/recovery/death from the disease, depending on infectivity, time taken to kill, etc). GLEAMviz15 is a proper tool* that needs a lot of base data to be defined. I’m thinking about it! PS – Some of the GleamViz team seem to be based in Turin. *see my comment to this post.
Such modelling reminds me of event-based simulation models I used to develop for the MoD back in the 70s, using purpose-designed programming languages such as Simscript (Fortran-like) and Simula (Algol-like), all based around what today might be called object oriented programs (OOP), where small modules of code represent micro-events that occur, having inputs that trigger them from previous (or influencing) events, and outputs that trigger subsequent (or influenced) events, all inputs and outputs coded with a probability distribution of occurrence. In the MoD case this was about – erm, what am I allowed to say! – reliability and failure rates in aircraft, refuelling tankers and cruise missiles (even then). I recall that in my opening program statement, I named it “PlayGod”. I did ask for a copy of it years later (it’s all in very large dusty decks of Hollerith cards somewhere (I did overlap with paper tape, but I was a forward looking person!)) but they refused. Obviously far too valuable to the nation (even if I wasn’t, judging by my salary).
With regard to the publication of the modelling tools, I’ll leave aside the data part of it…there is a lot, and much of it will be fuzzy, and I’m sure is very different for every country’s population, depending where they are with the disease, and what measures they have taken over a period and whether, for example, they had “super-spreaders” early on, as some countries did.
Back in the early seventies, a European macro-economic project led to the publication of a book, The Limits to Growth. The context is described in https://en.m.wikipedia.org/wiki/The_Limits_to_Growth Not long after, our Government released the macro-economic model software they were using to explore these aspects of the performance and the future of our economy in a worldwide scenario. I worked at London University’s commercial unit for a while, and we mounted this freely available Government model, and offered it to all-comers.
The Economist publication was one of our clients for this, and their chief economics journalist, Peter Riddell, was someone we met several times in connection with it. He was very interested in modelling different approaches (to matters such as exchange rates, money availability (monetary and fiscal policy) and their impact on economic development in a constrained environment) and report on them, as a comparison with the Government’s own modelling, policies and strategies.
This was all at a time when the Economy was perceived to be at risk (as it is now from this pandemic), and inflation and exchange rates, monetary and fiscal policy (for example) were very much seen as determinants of our welfare as a nation, and for all other nations, including the emerging European Community, who originally sponsored this work.
We seem to be in a very similar situation with regard to the modelling of this Covid-19 pandemic, and I don’t see why the software being used (at least by the UK Government’s analysts (probably academics, I think Sir Patrick Vallance said in his answers to the Select Committee this morning)) could not be released so that we might get a better understanding of the links assumed between actions and outcomes, impact of assumptions made, and the impact of actual and potential measures (and their feedback loops, positive or negative) taken in response to the outbreak. Apparently the Government IS going to release its modelling, and I would certainly be interested to see it – its parameters, assumptions, its logic and its variety of outcomes and dependencies on assumptions. The possible outcomes are probably EXACTLY why the Government hasn’t released it yet.