Even having explored Prof. Alex Visscher’s published MatLab code for a week or two now, with UK data, even I am surprised at how well the model is matching up to published UK data so far (on April 18th 2020).
I’ll just present the charts produced my my version of the model for now. One chart is with linear scales, and the other with a log y-axis to base 10 (each main gridline is 10x the previous one):
We can see that the upcoming acid test will be as the case numbers (and associated death rates) begin to fall, thanks to the social distancing, working from home and other measures people are being asked to take (successfully, in general, at the moment).
The new consideration (new in public, at least) is the question of whether there will be a vaccine, how well it might work, for how long immunity might last, and whether there will be re-infection, and at what rate. If there is no pharmaceutical resolution, or only a partial one, these single phase infection models will need to develop, and, as we saw in my last post at The potential cyclical future pandemic, a cyclical repetition of a lockdown/epidemic loop might look the most likely outcome for a while, until the vaccine point is resolved.
Meanwhile, this single phase model flattens out at a little over 39,000 deaths. I hope for the sake of everyone that the model turns out to be pessimistic.
As an alumnus, I had the opportunity today (with 2000 other people in over 60 countries) to attend a Cambridge University webinar on the current pandemic crisis. It was moderated by Chris Smith, a consultant medical virologist and lecturer at Cambridge (and a presenter on https://www.thenakedscientists.com/ where more about this event will be found).
The experts presenting were Dr Nick Matheson, a Principal Investigator in Therapeutic Immunology, and Honorary Consultant at Addenbrookes Hospital, and Prof. Ken Smith, Professor in Medicine at Cambridge, and Director of Studies for Clinical Medicine at Pembroke College.
The video of the 45 minute session is available, and I will share it here in due course (it’s only on a closed group at the moment, but will be on the Cambridge YouTube channel https://www.youtube.com/channel/UCfcLirZA16uMdL1sMsfNxng next week) but of most interest to me, since I was interested in the modelling of the pandemic outbreak, was the following slide put up by Nick Matheson, depicting research by Neil Ferguson’s group at Imperial College, and also by the research group at Harvard University School of Public Health. Here it is:
It seems to me that as the Neil Ferguson group is involved with this (and the Harvard slide seems to corroborate something similar) it is likely that the UK Government is receiving the same material.
I am generally in support of not forecasting or detailing any imminent reduction in lockdown measures, and I can see from this why the Government (led by the science) might be pretty reluctant too.
You will know from my previous posts on this topic why the R0 number is so important (the Reproductive Number, representing the number of people one person might infect at a particular time). Sir Patrick Vallance (CSA) has mentioned several times that there is evidence that is R0 is now below 1 (on the 16th April he actually said it was between .5 and 1) for the general population (although not so in care homes and hospitals, probably).
He and Prof. Chris Whitty (CMO) (an epidemiologist) have also been saying (and so have Government, of course, on their advice) that they don’t want to reduce restrictions yet, in order to avoid a second peak of cases.
But from the Imperial College and Harvard research (mathematical modelling, presumably, on the basis of “non-pharmaceutical” likely actions that might be taken from time to time to control the outbreak) it would seem that we are in for multiple peaks (although lower than the first we have experienced) going well into 2021, and into 2022 in the case of the Harvard chart.
I was one of a very large number of people putting in a written question during the webinar, and mine was regarding the assumptions here (in the modelling) about the likelihood (or otherwise) of a vaccine on the one hand, and repeat infections on the other.
It might be that any vaccine might NOT confer long lasting immunity. It might even be that, like Dengue Fever, there will be NO vaccine (this was stated verbally in the Q&A by Prof. Ken Smith).
All very sobering. And I haven’t been drinking. Here is the full slide set:
In my research on appropriate (available and feasible (for me)) modelling tools for epidemics, I discovered this paper by Prof. Alex de Visscher (to be called Alex hereafter). He has been incredibly helpful.
Thanks to Alex including the MatLab code (a maths package I have used before) in the paper, and also a detailed description of how the model works, it qualifies as both available and feasible. There is also a GNU (public and free) version of MatLab (called Octave)) which is mostly equivalent in terms of functionality for these purposes, which makes it practical to use the mathematical tools.
In my communications with Alex, he has previously helped me with working out the relationship between the R0 Reproductive Number (now being quoted every time on Government presentations) and the doubling period for the case number of disease outbreak. I published my work on this at:
since mathematical expression are too cumbersome here on WordPress.
I would still say the Government is being a little coy about these numbers, saying that R0 is below 1 (a crucial dividing point between epidemic increase at R0>1 and epidemic slow down and dying out for R0<1). They now say (April 14th) it’s less than one outside hospitals and care homes, but not inside these places, as if (as Alex says) Covid-19 is not lethal, except for those people who die from it.
At least the Government have now stopped using graphs with the misleading appearance of y-axis log charts.
These following charts (unlike the original Government ones) make it pretty clear what kind of chart is being shown (log or linear) and also illustrate how different an impression might be given to a member of the public looking on a TV screen a) if the labelling isn’t there or obscurely referenced, and b) how the log charts to some extent mask the perception of steep growth in the numbers up until late March.
This chart from 14th April show the Government doing a little better over recent days in presenting a chart that visually represents the situation more clearly. Maybe the lower growth makes them more comfortable showing the data on linear axes.
Alex de Visscher model description
I’ll try to summarixe Alex’s approach in the model I have begun to use to work with the UK numbers (Alex had focused on other countries (US and Canada nearer to home, plus China, Italy, Spain, Iran with more developed outbreak numbers since the epidemic took hold).
I have previously described the SIR (Susceptible, Infected and Recovered) 3-compartment model on this blog at https://www.briansutton.uk/?p=1296, but in this new model we have as many as 7 compartments: Uninfected (U), Infected (I), Sick (S), Seriously Sick (SS), Deceased (D), Better but not recovered (B) and Recovered (R). They are represented by this diagram:
Firstly, an overall infection rate r1 above can be calculated as a weighted sum of these four sub-rates;
r1 = (k11I+ k12S+ k13SS+k14B)U/P
where the infection rates k11, k12, k13, k14, are set separately, towards I, S, SS and B respectively, from healthy people, with the rates proportional to the fraction of uninfected people, compared to the total population, U/P.
All other transitions between compartments are assumed as first order derivatives with their own rate constants, with time t measured in days:
dU/dt = -r1; the rate of movement from Uninfected to Infected; and similarly, with in-/out-puts added and subtracted for other compartments:
Transition rates between compartments
r1 = (k11I+ k12S+ k13SS+k14B)U/P
dU/dt = -r1
r2=k2I for I to S transitions
with dI/dt = r1-r2
r3=k3S for S to SS transitions
with dS/dt = r2-r3-r5
r4=k4SS for SS to D transitions
with dD/dt = r4
r5=k5S for S to B transitions
with dB/dt = r5+r6-r7
r6=k6SS for SS to B transitions
with dSS/dt = r3-r4-r6
r7=k7B for B to R transitions
with dR/dt = r7
Table 1: The possible transitions and their rates between compartments
Some simplifying assumptions are made: k12 = k11/2, k13 = k11/3 and k14 = k11/4, in other words scaling the infection rates to I from the other compartments S, SS and B, respectively, to the principal rate from U to I. (Subsequently k14 = 0*k11/4 was chosen to compensate for long contagious times once the SS median had been changed from to 10 days from 3.5 days. See below).
So we can recognise that the bones of the previous SIR model are here, but there is compartment refinement to allow an individual’s status to be more precisely defined, along with the related transitions.
The remaining parameters that can be set in the model are:
k3/k5 is based on the assumption that 10% of the infected (I) become seriously sick (SS), whereas 90% get better without developing serious symptoms; less than the 80:20 proportion observed normally, because many with mild symptoms go unreported.
The k4/k6 ratio is set assuming a 15% non-survival rate for hospitalised cases.
k5 and k7 were set in the original model assuming a median rate of 3.5 days in each of the S and B stages before moving to the Recovered R stage (ie a median 7 days to recover from mild symptoms). k7 has now been set to a median of 10 days (implying a 13.5 day median for recovery from mild symptoms), and to compensate for the resulting long contagious times, k14 = 0*k11/4 has been chosen as stated above.
k6 is based on an SS stage patient remaining in that stage for 10 days before transitioning to B. Because the other pathway is to the Deceased D stage, the median is actually less, 8.5 days. This would be consistent with a mean duration of 12.26 days for getting better (B), which has been observed clinically.
Implementation of the model and further parameter setting
I have been operating the model with some UK data using Octave, but Alex was using the equivalent (and original licensed software) MatLab. He used the Ordinary Differential Equation Solver module in that software, ODE45, which I have used before for solving the astronomical Restricted Three Body Problem, so I know it quite well.
Just as all of the various Governmental, Johns Hopkins University and Worldometer published graphs and data are presented with a non-zero case baseline to start the simulations, as the lead-in time is quite long, so does this model. These initial parameters for the UK model so far are chosen as:
Thus the doubling time I called TD there is given by the equation:
TD loge2 = loge (C30 /C29)
where Cd is the reported number of cases on a given day d. Known cases are assumed to be 5% of infected cases I, 1/3 of sick cases S, 90% of seriously sick SS, 12% of recovering R and 90% of deceased cases D (not all are known but we assume, for example, that reporting is more accurate for SS and D). The calculated doubling time in the model is not sensitive to the choice of these fractions.
The critical factor R0, the Reproduction Number, can be calculated by a separate program module, for one infected patient, where the number of newly infected starting with that case can be calculated over time.
Solving the equations
The model, with the many interdependent parameter choices that can be made, solves the Differential Equations described earlier, using the initial conditions mentioned in Table 3. Alex’s paper describes runs of the model for different assumptions around interventions (and non-interventions), and he has presented some outcomes of National models (for example for China) on his YouTube channel at:
and Alex’s introduction to the model on that channel, for both China, and the world outside China which describes the operation of the model, and the post processing fitting to real data charts can be seen at
Alex’s early April update on Italy is at:
I’m at an early stage of working on a UK model, using Alex’s model and techniques, and the most recent UK data is presented at the UK Government site at:
using the link to the total data that has been presented to date on that site. Unfortunately, each day I look, the presentation format changes (there used to be a spreadsheet which was the most convenient) and now the cumulative case count for the UK has increased suddenly by 3000 on top of today’s count. Death numbers are still in step. It isn’t hard to see why journalists might be frustrated with reporting by the UK Government.
I have, as a first cut, run the numbers through the model to get an initial feel as to outcomes. The charts produced are not yet annotated, but I can see the shape of the predictions, and I will just include one here before I go on to refine the inputs, the assumptions, the model parameters and the presentation.
In this chart, days are along the x-axis, and the y-axis in a log scale to base 10 (equal intervals are a multiple of 10 of the previous y-gridline) of numbers of individuals.
The red curve represents active cases, the purple curve the number of deaths D, and the yellow curve the number of seriously sick patients, SS. This chart is the simpler of the two that the model produces, but serves to illustrate what is going on.
I have not yet made any attempt to match the curves against actual UK outcomes for the first 50 days (say) as Alex has done for the other countries he has worked on.
The input choices for this version for the UK are:
interv_time = 52; % intervention time: Day 51 = March 23 – 1 = March 22
interv_success = 0.85; % fractional reduction of k11 during intervention
k11 = 0.39; % infection rate (per day)
k5 = log(2)/3.5;
k6 = log(2)/10;
k7 = log(2)/10;
fSS = 0.1; % fraction seriously sick
fCR = 0.04; % fraction critical
fmort = 0.015; % mortality rate
correction = 0.0012;
P = 67.8e6; % total population
The correction number is set to adjust for the correct starting number of cases. On Alex’s advice I set k11 = 0.39 which seems to provide a good initial match to death data.
In terms of numerical outputs, this is just a model, and hasn’t even been tuned, let alone fine tuned. The charts above reflect the output numbers in the model, which were:
doubling_time = -7511748644 – not relevant here for a declining epidemic; final_deaths = 39,319 reached after a little over 100 days; total_infections = 2,621,289 peaking at about 60 days
and I’m still making sense of these and working out some possible adjustments. Firstly I want to match with real data up to to date, so there is some spreadsheet and plotting work to do.
Now the model has been run, there can be a combinatorial number of changes that could be made to get the model to fit the first 50 days’ data.
The work has only just started, but I feel that I have a worthwhile tool to begin to parallel what the Government advisers might be doing to prepare their advice to Government. Watch this space!
In the recent daily UK Government presentations, the R0 Reproductive Number has been mentioned a few times, and with good reason. Its value is as a commonly accepted measure of the propensity of an infectious disease outbreak to become an epidemic.
It turns out to be a relatively simple number to define, although working back from current data to calculate it is awkward if you don’t have the data. That’s my next task, from public data.
If R0 is below 1, then the epidemic will reduce and disappear. If it is greater than 1, then the epidemic grows.
The UK Government and its health advisers have made a few statements about it, and I covered these in an earlier post.
This is a more technical post, just to present a derivation of R0, and its consequences. I have used a few research papers to help with this, but the focus, brevity(!) and mistakes are mine (although I have corrected some in the sources).
This is an extract from the Government daily Coronavirus briefing on 2nd April 2020, led by Matt Hancock, with Professor Stephen Powis, Medical Director of the NHS in England.
In this clip, Prof. Powis states that he thinks there is “early academic evidence” that the R0 “Reproduction Number”, (what he calls the transmission rate) reflecting the average number of people infected by one person, “may have fallen below 1”. This is an important target, because when R0 for a given epidemic is below 1, the epidemic will die out.
My issue here is that I would like to see the evidence.
In Matt Hancock’s statement, he said that “the best scientific analysis is that” the “doubling” period for the rate of infection of the virus (another important statistic, used by Johns Hopkins University in many of their charts for the epidemic) was between three and four days.
In the early stages of a rapidly growing epidemic, either of three or four days lead to quite different outcomes (for power series (or exponential) growth) after, say, 30 days (since day zero, defined as 100 cases), for the total number of people infected – 3 days leads to 100,000 cases, and 4 days to 18,000.
See my previously posted spreadsheet to run your own numbers for this:
My opinion is that it’s an optimistic range, but in any case it is too wide to be meaningful without more detail. I suspect 2.5 days (which would lead to 400,000 cases) might be a better number, but without seeing the scientific data – the “evidence” in either case – who knows?
But do you see how sensitive outcomes are to quite small changes in estimates to the input data at this early stage in the epidemic?
Both of these video statements sound to me to be too optimistic, and smack of concern to reassure the public rather than undiluted science.
My mention of “exponential” above reminds me to repeat my usual issue with presentations of log scale graphs without explanation.
The visual appearance of this chart from the 2nd April update, to the average member of the public, doesn’t really highlight visually that the growth in numbers is exponential – it looks more as if the rates are flattening. It requires scrutiny of the y-axis to see that the top gridline in the chart is representing a number (10,000) 200 times higher than the baseline, 50. It’s at least careless not to mention that, other than “Logarithmic scale” (not saying whether x- or y-axis) in a tiny font at the bottom of the chart. A similar chart was shown on 3rd April too.
I’m guessing that the chart above was drawn straight from either the Worldometer, or Johns Hopkins University site without due consideration of its meaning for non-practitioners.
They do know the difference, of course; this DHSC chart, with linear axes, was also presented, and it does show the characteristic exponential growth of the epidemic, in this case for the deaths from the outbreak.
In the news today (coincident with my current work on this post) we can see how important the measures taken through social distancing, self-isolation and (partial) lockdown are to reducing the rate of infections. So far, the SHTM researchers say “…the R0 could be cut…” and “likely lead to a substantial impact and decline…” – i.e. we are on track theoretically, but we are yet to see the full benefits. It is still my view, expressed in my last post, that in the UK, we are still at an R0 figure of 2.5-3.5, with a doubling period of 2-3 days (2 and 3 are VERY different figures in this exponential context, as we see from my earlier posts).
While it’s a small study, it’s interesting to note the intention to reduce the R0 Reproduction Number (that I talked about in my last post), equivalent to the average number of people a person infects in the “susceptible” part of the population (those not yet infected, or recovered). The SIR model I have been working on reflects this concept in generic terms, as described below.
The model is, in effect, a Markov Chain solution to the infection activity, where each state of the SIR (Susceptible-Infected-Recovered compartmentalised) population at time t+δt depends only on the previous state at time t, solved through Monte-Carlo sampling for the statistical behaviour (ie not analytically), using an iterative procedure called the Gillespie algorithm, described last time, to solve the differential equations of the SIR model.
Thanks to some great work by Tomasz Knapik (Sutton family friend) I can show show a couple of new charts relating to variations in the data for the Gillespie algorithm for solving the SIR equations I talked about in my previous post.
You’ll recall this chart from last time, where for a population of 350, with infection rate α, and recovery rate β set, the progress of this simple SIR model shows the peak in infections from 1 index case, with a corresponding reduction in susceptible individuals, and an increase in recoveries until the model terminates with all individuals recovered, and no susceptible individuals left.
I drew this with a log scale (base 10) for time, because in this model, with the default input data, not much happens for quite a while from time t=0, until the infections take off rapidly at t=0.1 (on the generic time scale) and reach a peak at t=0.3. The model is fully played out at t=10. The peak for infections, out of a population of 350, is 325, for this input data:
Key parameters for the basic SIR model are:
The total population, N; the rate of infection α; and the rate of cure β.
Base default data for the simulation is for 1 index infected case at t=0, and the “Spatial Parameter” V, set at 100 in the generic units used by the model. I’ll talk more about this parameter in a later post.
A larger V reduces the infection growth through “distancing” the infected cases from the susceptible compartment of the population, and a smaller V increases the pace of infection by reducing the distance in general terms.
The equations for the simple SIR model being analysed, for a limited total population N, where:
N = nS + nI + nR , the sum of the compartmentalised Susceptible, Infected and Recovered numbers at any time t;
With linear scales for both the x-axis (time) and the y-axis (cases) we can see, in the following two charts, that a change in the spatial parameter (V) in the differential equations above does precisely what the Government is trying to achieve with its measures regarding self-isolation, social distancing and reduction in opportunities for people to meet other than in their household units.
Firstly, in the the modified Python Gillespie model (the Knapik, Sutton & Sutton model!) for the case illustrated above, but with linear axes:
Second, I change the spatial parameter, V, to 1000, for illustrative purposes:
We see above that the spatial parameter, representing increasing “social distancing” in this SIR application, does what is intended by the social distancing behaviour encouraged by most Governments – it pushes the peak of the infections from t=0.3 further out, towards t=2, and also flattens the peak, down to about 140 cases, from 325 before.
My task now is to relate the following key parameters of this situation:
First of all, the Reproduction Number R0 (the number of people one person might infect).
Secondly, the generation interval (how long it takes that infection to take effect (a vital number, and the subject of some difference in the literature between medical practitioners on the one hand, and demographic / biological practitioners on the other). An epidemiological formula, and a demographic/ecological/evolutionary biology formula can both be used to assess the generation interval, from R0, but can lead to very different results, to summarise a point made in paper on exactly this topic (ref. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1766383/).
Third, the doubling period – what is the right logarithmic multiplier in the Johns Hopkins University charts, given R0?
And lastly, the time taken, relatively, to see the R0 come down to a value near 1, when the epidemic stabilises and would tend to die out if R0<1 can be achieved.
No account is taken by my SIR model of recovery and re-infection; the SEIS model mentioned last time, or a variation of it, would be needed for that.
No-one is clear, furthermore, as to the seasonality of this Coronavirus, Covid-19, with regard to winter and summer, ambient humidity and temperatures, or the likelihood of mutation.
This article by the BBC’s Rachel Schraer explores the modelling for the progression of the Coronavirus Covid-19. In the article we see some graphs showing epidemic growth rates, and in particular this one showing infection rate dependency on how many one individual infects in a given period.
This chart led me to look into more sophisticated modelling tools than just the spreadsheet I already mentioned in my previous article on Coronavirus modelling; this is a very specialist area, and I’m working hard to model it more fully.
My spreadsheet model was a simple power law model; it allows you to enter a couple of your own parameters (the number of days out, and the doubling period for cases in days) to see the outcome; see it at:
It lists, as a table, case outcomes after a given number of days (up to 30 – but you can enter your own forecast number of days, and doubling period) since 100 cases, given how many days it is assumed it takes for cases to double. It’s just a simple application of a power law, and is only an analysis of output rate numbers, not a full model. It explains potential growth, on various doubling assumptions. It appears, for example, in the following Johns Hopkins chart (this time for deaths, but it’s a similar model for cases), which presents the UK and Italy prognoses lying between doubling every two days and three days since the index day at 10 deaths:
Any predicted outcomes are VERY dependent on that doubling rate assumption, as my spreadsheet showed – in terms of cases, after 30 days since 100 cases, a doubling every 2 days would lead to about 3 million cases, but a doubling every 3 days leads to 100 thousand cases. This is an example of the non-linearity of the modelling – a 50% improvement in the case doubling period leads to a 30-fold improvement in prediction for case numbers after 100 days.
To reproduce the infection rate growth numbers in the BBC article above, relating the resultant number of cases after 30 days (say) to the average number of people an individual infects (the so-called R0 number, the Basic Reproduction Number) requires a deeper modelling technique. For an R0 explanation, see https://en.m.wikipedia.org/wiki/Basic_reproduction_number
I was interested, seeing the BBC infection rate chart, and its implications, to understand how precisely the number of people an individual is assumed to infect (on average) is related to the “doubling” rate assumptions we can make in the spreadsheet analysis.
I’ve been looking at SIR modelling – Susceptibility-Infected-Recovered modelling – in a simple form, to get the idea of how it works. There are quite a few references to the topic, going back a long way. A very useful paper I have been consulting is from Stanford University in 2007 (https://web.stanford.edu/~jhj1/teachingdocs/Jones-on-R0.pdf), and some of the basis for the shape of that basic modelling goes back to Kermack and McKendrick in 1927).
Usefully I have found some Python code in the Gillespie reference below that codifies a basic model, using a solution technique to the basic equations (which although somewhat simple first-order differential equations, are non-linear and therefore difficult to solve analytically) employing this Gillespie algorithm, which derives from work done in 1976, and is basically a Monte-Carlo probabilistic iterative time-stepping model well suited to computers (of the type I used to play with (for a quite different purpose) for the MoD in the 1970s).
My trial model (to become familiar with the way that the model behaves) is based on the Python code, and I found that with the small total population (N) of 350, with generic parameters for infection rate (α) and recovery rate (β), there is slow growth in cases for a long time (relatively) and then a sharp increase (at about t=0.1 time units in the chart below) leading to a peak at about t=0.3, when recovery starts to happen; the population returns to health at about t=10. The very slow initial growth (from ONE index case) is why I show the x-axis with a log scale.
This very slow growth from ONE case is, I guess, why most charts begin with the first 100 cases (or, in the case of deaths, 10) so that the chart saves horizontal axis space by suppressing the long lead-in period.
My next task is to put some real numbers into a model like this, and to work it though for a LARGE population, and for comparison, to run it from time zero at 100 cases (which might avoid the long lead time in this current generic model).
I expect to find that I could then use a linear x-axis time scale, but that I would have to present the chart with a log y-scale for cases, as the model would need to represent the exponential growth we have seen for Coronavirus.
More sophisticated models also include birth-death adjustments (a demographic model) in the work, but as the life-cycle being assessed for the Covid-19 virus is much shorter (hopefully!) than the demographic cycle, this is ignored to start with.
Another parameter that might be included for some important infections, where there is a significant incubation period during which individuals have been infected but are not yet infectious themselves, is the “Exposed” parameter. During this period the individual is in a compartment E (for Exposed), prior to entering compartment I (for Infected), turning the SIR model into a SEIR model.
Another version of the model might take into consideration the exposed or latent period of the disease, and where an infection does not leave any immunity after recovery, so that individuals that have recovered return to being susceptible again, moving them back into the S(t) (Susceptible as a function of time) compartment of the model; this model is therefore called the SEIS model. For a description of these models, and more, see https://en.m.wikipedia.org/wiki/Compartmental_models_in_epidemiology
So we see that this is a far more complicated issue than at first sight. It is why, I think, Sir Patrick Vallance, the Chief Scientific Adviser, today began to talk about the R0 figure (a dimensionless number (a ratio)) relating to the average number of people that one individual might infect.
My feeling was that we are far from a value for R0 that would lead to the end of the epidemic being in sight, since, if we in the UK are tracking a doubling of cases every 3 days (as we have been), then this might be nearer to an R0 of 2.5, rather than anywhere near ONE. If R0 drops below 1, then the epidemic would eventually die out, which he mentioned. Above 1 and it continues to grow. As I said, I think we are far from an R0<1 situation.
The amount by which R0 exceeds the value 1 might not seem to have such a great effect on the numbers of cases we are seeing, at these early stages of an epidemic, as a but as the days wear on, the effects are VERY (i.e. exponentially) noticeable, and this is why the charts often have y-axis scales that are logarithmic, because otherwise they couldn’t easily be displayed.
In a linear y-axis chart, we run out of y-axis space quite quickly for exponential functions; to see all the data, at the later time values, we have to compress the chart vertically so much that it is then hard to see the earlier, lower numbers; we see this in such a chart below, that has a lot of “growing” to do. Note the dotted line which is the predicted line for doubling of cases every 3 days (which we in the UK have been tracking):
It has therefore become more usual to present the data differently, with a log scale for the y-axis, where, for example, the sample dotted “doubling” lines are straight lines, not steeply growing exponential curves (in the chart below, two dotted guidelines are shown for deaths, one for 2 day doubling, and one for 3 days); the shorter the doubling period, the steeper the straight line on such a chart:
In the 30th March Vallance presentation on TV, the growth curves on the last couple of log charts shown (cases and deaths, respectively) had a SLOPE that was DEcreasing slowly, not INcreasing (exponentially) rapidly (as the raw numbers actually are) although for a mathematician (or a data visualiser) this is a valid way to present such data.
The visual effect of choosing a such a log scale for the y-axis would have been explained in more detail in an academic lecture theatre (as I have tried to do here), and I think it is useful to point this out, and would be a clarification in the Government updates.
A final point, made in both the 29th and 30th March daily TV presentations, is that actions taken today will not have a tangible effect until a few days (maybe a week to two weeks) later; the outputs lag the inputs because of the lead times involved in infection rates, and in the effect of counter-measures on their reduction. What we see tomorrow doesn’t relate to today’s actions, it depends more on actions taken a week or more ago.
From recent charts, shown in Government updates, it does seem that what was done a week ago (self-isolation, social distancing and reduction in opportunities for people to meet other than in their household units) is beginning to have a visible effect on travel patterns, but any moves in the infection charts, if at all, are rather small so far.
If we had experienced (through the social distancing and other precautionary measures) and continue to experience a doubling period of 5 days (not on the chart but a possible input to my spreadsheet), it would lead to 25,000 cases after 40 days.
If we had managed to experience 7 days for doubling of cases (as Japan and Singapore seem to have done), then we would have seen 5000 cases at 40 days (but that’s where we are already, so too late for that outcome).
So the outcomes are VERY sensitively dependent on the doubling period, which in turn is VERY dependent on the average number of people each carrier infects.
I haven’t modelled that part yet, but, again, assumptions apart, the doubling period would be an outcome of that number, together with how long cases last (before death or recovery) and whether re-infection is possible, likely or frequent. It all gets a bit more difficult to be predictive, rather than mathematically expressing known data.
On a more positive note, there is a report today of the statistical work of Michael Levitt (a proper data scientist!), who predicted on February 21st, with uncanny accuracy, the March 23rd situation in China (improvements compared with the then gloomy other forecasts). See the article attached.
A few people might have see the Johns Hopkins University Medical School chart on Covid-19 infection rates in different countries. This particular chart (they have produced many different outputs, some of them interactive world incidence models – see https://coronavirus.jhu.edu/map.html for more) usefully compares some various national growth rates with straight lines representing different periods over which the number of cases might double – 1 day, 2 days, 3 days and 7 days. It’s a kind of log chart to base 2.
I’ve been beginning to simulate the outcomes for 2 input data items:
your chosen number of days (x) since the outbreak (defined at 100 cases on day zero to give a base of calculation);
your chosen rate of growth of cases, expressed by an assumed number of days for doubling the cases number (z), and then;
the output, the number of cases (y) on day x.
This spreadsheet allows you , in the last columns, to enter x and z in order to see the outcome, y.
Of course this is only an output model, it knows nothing about the veracity of assumptions – but the numbers (y) get VERY large for small doubling periods (z).
Here’s the kind of stuff that the Covid-19 modellers will be doing. https://www.nature.com/articles/srep46076.pdf I have downloaded GleamViz, http://www.gleamviz.org/simulator/client/, and it is quite complicated to set up (I used to have a little Windows app called Wildfire that just needed a few numbers to get a pictorial progression of life/recovery/death from the disease, depending on infectivity, time taken to kill, etc). GLEAMviz15 is a proper tool* that needs a lot of base data to be defined. I’m thinking about it! PS – Some of the GleamViz team seem to be based in Turin. *see my comment to this post.
Such modelling reminds me of event-based simulation models I used to develop for the MoD back in the 70s, using purpose-designed programming languages such as Simscript (Fortran-like) and Simula (Algol-like), all based around what today might be called object oriented programs (OOP), where small modules of code represent micro-events that occur, having inputs that trigger them from previous (or influencing) events, and outputs that trigger subsequent (or influenced) events, all inputs and outputs coded with a probability distribution of occurrence. In the MoD case this was about – erm, what am I allowed to say! – reliability and failure rates in aircraft, refuelling tankers and cruise missiles (even then). I recall that in my opening program statement, I named it “PlayGod”. I did ask for a copy of it years later (it’s all in very large dusty decks of Hollerith cards somewhere (I did overlap with paper tape, but I was a forward looking person!)) but they refused. Obviously far too valuable to the nation (even if I wasn’t, judging by my salary).
With regard to the publication of the modelling tools, I’ll leave aside the data part of it…there is a lot, and much of it will be fuzzy, and I’m sure is very different for every country’s population, depending where they are with the disease, and what measures they have taken over a period and whether, for example, they had “super-spreaders” early on, as some countries did.
Back in the early seventies, a European macro-economic project led to the publication of a book, The Limits to Growth. The context is described in https://en.m.wikipedia.org/wiki/The_Limits_to_Growth Not long after, our Government released the macro-economic model software they were using to explore these aspects of the performance and the future of our economy in a worldwide scenario. I worked at London University’s commercial unit for a while, and we mounted this freely available Government model, and offered it to all-comers.
The Economist publication was one of our clients for this, and their chief economics journalist, Peter Riddell, was someone we met several times in connection with it. He was very interested in modelling different approaches (to matters such as exchange rates, money availability (monetary and fiscal policy) and their impact on economic development in a constrained environment) and report on them, as a comparison with the Government’s own modelling, policies and strategies.
This was all at a time when the Economy was perceived to be at risk (as it is now from this pandemic), and inflation and exchange rates, monetary and fiscal policy (for example) were very much seen as determinants of our welfare as a nation, and for all other nations, including the emerging European Community, who originally sponsored this work.
We seem to be in a very similar situation with regard to the modelling of this Covid-19 pandemic, and I don’t see why the software being used (at least by the UK Government’s analysts (probably academics, I think Sir Patrick Vallance said in his answers to the Select Committee this morning)) could not be released so that we might get a better understanding of the links assumed between actions and outcomes, impact of assumptions made, and the impact of actual and potential measures (and their feedback loops, positive or negative) taken in response to the outbreak. Apparently the Government IS going to release its modelling, and I would certainly be interested to see it – its parameters, assumptions, its logic and its variety of outcomes and dependencies on assumptions. The possible outcomes are probably EXACTLY why the Government hasn’t released it yet.