Select one of the keywords on the left…

Epidemic modelingStochastic SIR models

Reading time: ~40 min

One of the main shortcomings of the Galton-Watson model is that it can exhibit indefinite growth. That's because it's effectively drawing from an infinite population of susceptible persons. One way we can make the model more realistic is to start with the full population and mark individuals as "removed" from the susceptible population once they have been infected.

In this section, we'll modify the Galton-Watson model in a way that incorporates recovered individuals. We'll draw from the material presented in this talk by Tom Britton at Stockholm University.

Let's specify the dynamics of our new model, which we'll call the discrete SIR model (where SIR stands for "susceptible-infectious-removed"):

  • At time 0, the first individual is infected and every other individual is susceptible.
  • For each time t > 0, each infected individual transmits to each susceptible individual with some fixed, small probability p (with infection event independent of all others). If an individual is infected at time t-1, they are recovered at time t.

This model is still missing some desirable features (select all that apply):

Transmission events are not really independent
One person can't transmit to several others
You can transmit to someone even if they aren't geographically close
Geographic proximity is an important consideration

Suppose that at a given time t-1, there are j infected individuals and k susceptible individuals. Then the probability that a particular individual susceptible at time t-1 becomes infected at time t is .

Solution. The probability that an individual avoids infection is the probability that each infection opportunity fails to transmit. This probability is (1-p)^j, since there are j infected individuals, and each one transmits with probability p. Therefore, the probability of infection for the individual is 1-(1-p)^j.

In terms of n and p, the expected number of infectious persons at the second time step (the one right after there's exactly one infectious person) is .

Solution. If we define I_k to be 1 if person k is infected on the second time step and 0 otherwise, then the random variable we're looking to find the expected value of is I_2 + I_3 + \ldots + I_n. Furthermore, each of these random variables has an expectation of p(1) + (1-p)(0) = p. By linearity of expectation, the expected number of infectious persons is \mathbf{E}[I_2] + \cdots + \mathbf{E}[I_n] = p + \cdots + p = p(n-1).

Bearing in mind the result of the previous exercise (as well as the approximation p(n-1) \approx pn), we'll define p to be \lambda / n. That way, we can control the expected (initial) number of transmissions from each infectious individual by adjusting the value of \lambda.

Use the code block below to simulate several runs of our discrete SIR model.

(a) Why does the number of infectious individuals necessarily converge to 0 as the time index goes to \infty?

(b) Does everyone end up having been infected?

using Plots
import Base.Iterators: countfrom
function SIR_simulation(population_size, infection_probability)
    statuses = fill("susceptible", population_size, 1)
    statuses[1, 1] = "infectious"
    transmissions = []
    for t in countfrom(2)
        n_infectious = sum(statuses[:, t-1] .== "infectious")
        if n_infectious == 0
        statuses = [statuses fill("susceptible", population_size)]
        for k in 1:population_size
            if statuses[k, t-1] == "recovered" || statuses[k, t-1] == "infectious"
                statuses[k, t] = "recovered"
        for j in 1:population_size
            if statuses[j, t-1] == "infectious"
                for k in 1:population_size
                    if statuses[k, t-1] == "susceptible"
                        if rand() < infection_probability
                            push!(transmissions, [(j, t-1),(k, t)])
                            statuses[k, t] = "infectious"
    statuses, transmissions
population_size = 20
infection_probability = 2 / population_size
statuses, transmissions = SIR_simulation(population_size, infection_probability)

To make it easier to see the whole array, we map the status values to colors and plot the entries as points:

function simplot(statuses, transmissions)
    colorize(status) = Dict("infectious" => :Red, 
                        "recovered" => :LightGray,
                        "susceptible" =>  :CornflowerBlue)[status]
    colors = [colorize(statuses[i]) for i in CartesianIndices(statuses)][:]
    p = scatter([(i[2], i[1]) for i in CartesianIndices(statuses)][:],
            color = colors, legend = nothing, 
            xlabel = "time", ylabel = "person")
    for transmission in transmissions
        plot!(p, reverse.(transmission), color = :DarkGray, arrow = :arrow)
simplot(statuses, transmissions)

Figure 2.1. Each row represent's an individual's status over time, with blue representing susceptibility, red representing infectiousness, and gray representing immunity.


(a) The number of infectious individuals eventually reaches the value 0 and stays there forever, since each time step with a positive number of infectious individuals either results in no new infections, or it reduces the finite number of susceptible individuals by at least 1. When the number of susceptible individuals reaches zero, no new infections can occur.

(b) No. There are almost always some individuals which make it to eradication without ever being infectious.

Reproduction Number

The expected number of transmissions from each infectious individual over time. In particular, conditioned on there being S_t susceptible individuals at time t, an infectious individual at time t transmits to people on average.

At some point, the proportion of the population which is susceptible is low enough that the expected number of transmissions is below 1, and then the infection dies out quickly. Thus some folks end up never being infected. Let's numerically explore the proportion of the population which ends up having been infected:

The code block below computes the eventual number of recovered individuals over many simulations of an SIR model with 1000 individuals. Adjust the numerator in the line which defines infection_probability to see how the behavior of the resulting histogram depends on this value.

using Plots
n = population_size = 1000
p = infection_probability = 1.0/population_size
n_recovered(n, p) = sum(SIR_simulation(n, p)[1][:, end] .== "recovered")
histogram([n_recovered(n, p) for _ in 1:5_000], xlims = (0, 1000), 
          nbins = 100, label = "", xlabel = "final number recovered",
          ylabel = "number of runs")

Solution. When the value is around 1 or lower, the infection tends to reach only a few individuals. When the value is larger than 1, the infection sometimes reaches very few individuals and sometimes reaches approximately some particular fraction of the population. For example, when the value is 1.5, the final number of recovered individuals is often around 60% of the population. Furthermore, this proportion is extremely likely to be either close to 60% or close to 0%.

Figure 2.2. The proportion of the population that the infection spreads to is almost always around either 60% or 0%.

The value \lambda = np is called the basic reproductive number. This is the value known in the literature (and in the news!) as R_0 (pronounced "R-naught"), and it represents the expected number of susceptible individuals each infectious person transmits to on a given early time step, when the number of susceptible individuals is nearly the whole population. More generally, if we want to talk about a time t is not necessarily early on, then the expected number of susceptible individuals each infectious person transmits to is called R_t.

In the context of this model, we calculated the value of R_t to be pS_t, where S_t is the number of susceptible individuals at time t.

Herd immunity

Let's reflect on how herd immunity works in the context of our discrete SIR model, using the language of Galton-Watson processes. If the value of R_0 is larger than 1, then the disease spreads exponentially early on. As a higher fraction of the population becomes infected (and thus immune), the value of R_t decreases proportionally. At some point, R_t reaches a value less than 1, and then the spread begins to decay rapidly and die out.

We can observe this phenomenon numerically by plotting R_t as a function of time alongside the results of a simulation:

using LaTeXStrings
population_size = 40
R₀ = 2
infection_probability = R₀ / population_size
statuses, transmissions = SIR_simulation(population_size, infection_probability)
plt = simplot(statuses, transmissions)
rt = plot(R₀ * sum(statuses .== "susceptible", dims = 1)[:]/population_size, 
    fillrange = 0, color = :blue, fillopacity = 0.2, legend = false,
    xlabel = "time", ylabel = L"R_t")
hline!(rt, [1.0])
plot(plt, rt, layout = (2, 1))

Figure 2.3. We observe growth in the number of infectious persons until R_t drops below 1, and then after that we observe decay.

Finding the infected proportion of the population at the point where R_t drops below 1 is a matter of solving the equation R_t = pS_t = 1 for S_t. We find that S_t = 1/p = n/R_0. For example, if R_0 = 2, then of the population has been infected at the point when R_t hits 1.

However, more people become infected after R_t goes below 1, because it takes a few further generations for the number of active cases to go down. Remarkably, we can do an elegant back-of-the-envelope calculation to work out the expectation of the proportion \tau of individuals who eventually become infectious (in the context of our discrete SIR model, and conditioned on the event that that proportion is not close to zero).

Loosely speaking, the value \tau that we're looking for here is the middle of the hump on the right in the histogram in Figure 2.3 (which shows number of simulations versus number of eventually recovered individuals in those simulations).

The key idea is to work out—in two different ways—the probability that a given individual manages to avoid infection from all infected individuals.

From the point of view of a given never-infectious individual, each eventually recovered individual attempts transmission to them . If we assume that the distribution of the number of eventually infectious individuals is concentrated around n \tau, then we conclude that the probability that a susceptible individual dodges infection is approximately (1-p)^{n\tau}.

Also, since the population is homogeneous (that is, each individual has the same properties in the model's dynamics), the probability that a given individual dodges infection is the probability that it lies in the 1-\tau proportion of individuals which never get infected.

Therefore, the value of \tau should satisfy the equation 1 - \tau = (1-p)^{n\tau}. Since p = R_0 / n, and since (1-x/k)^k \approx \mathrm{e}^{-x} for large k, obtain the equation 1 - \tau = \mathrm{e}^{-R_0 \tau}.

We can plot \tau as a function of R_0 by solving this equation numerically. For comparison, we also show the proportion infected at the point where R_t goes less than 1.

using Roots, Plots, LaTeXStrings
     R₀ -> find_zero(τ -> 1 - τ - exp(-R₀*τ), 0.5), 
     label = "", xlabel = L"$R_0$", ylabel = L"$\tau$", 
     ylims = (0,1))
plot!(1:0.01:4, R₀ -> 1 - 1/R₀, label = "")

Figure 2.4. If R_0 is less than or equal to 1, then there is no outbreak. If R_0 > 1, the approximate proportion of the population that will eventually be infected is an increasing function of R_0.

The vertical gap between the two curves plotted in Figure 2.4 represents...

the proportion of individuals who eventually become infected
the proportion of individuals who become infected after the point when the infection starts decreasing in prevalence

Mitigation measures

The value of R_0 can be lowered through preventative measures. Roughly speaking, R_0 is a product of three factors:

  • Transmission probability for each contact with another person
  • Number of contacts per day
  • Number of days in infectious period

For example, handwashing reduces , while testing can help reduce . Social distancing reduces .

Early estimates of the R_0 value of COVID-19 (under transmission conditions as they existed before mitigation measures were in place) have been somewhere between 2 and 3. Based on this R_0 value and the equation we derived, approximately what proportion of the population would eventually become infectious?

Solution. Based on the graph, we can say the proportion would be between 80% and 90%. Values estimated by experts vary in the 20%-70% range, so clearly the differences between the SIR model and the more advanced models being used by epidemiologists do have an impact on the proportion of the population the infection would eventually reach (under the assumption that no mitigation measures are employed).

Bruno Bruno