OpenEpi is a free, web-based, open source, operating system-independent series of programs for use in epidemiology, biostatistics, public health, and medicine, providing a number of epidemiologic and statistical tools for summary data. OpenEpi was developed in JavaScript and HTML, and can be run in modern web browsers.
The program can be run from the OpenEpi website or downloaded and run
without a web connection. The source code and documentation is
downloadable and freely available for use by other investigators.
OpenEpi has been reviewed, both by media organizations and in research
journals.
The OpenEpi developers have had extensive experience in the development and testing of Epi Info, a program developed by the Centers for Disease Control and Prevention (CDC) and widely used around the world for data entry and analysis. OpenEpi was developed to perform analyses found in the DOS version of Epi Info
modules StatCalc and EpiTable, to improve upon the types of analyses
provided by these modules, and to provide a number of tools and
calculations not currently available in Epi Info. It is the first step
toward an entirely web-based set of epidemiologic software tools.
OpenEpi can be thought of as an important companion to Epi Info and to other programs such as SAS, PSPP, SPSS, Stata, SYSTAT, Minitab, Epidata, and R (see the R programming language). Another functionally similar Windows-based program is Winpepi. See also list of statistical packages and comparison of statistical packages. Both OpenEpi and Epi Info
were developed with the goal of providing tools for low and moderate
resource areas of the world. The initial development of OpenEpi was
supported by a grant from the Bill and Melinda Gates Foundation to Emory University.
The types of calculations currently performed by OpenEpi include:
For epidemiologists and other health researchers, OpenEpi performs a
number of calculations based on tables not found in most epidemiologic
and statistical packages. For example, for a single 2x2 table, in
addition to the results presented in other programs, OpenEpi provides
estimates for:
Etiologic or prevented fraction in the population and in exposed with confidence intervals, based on risk, odds, or rate data
Four different confidence limit methods for the odds ratio.
Similar to Epi Info, in a stratified analysis, both crude and adjusted estimates are provided so that the assessment of confounding can be made. With rate data, OpenEpi provides adjusted rate ratio’s and rate differences, and tests for interaction. Finally, with count data, OpenEpi also performs a test for trend, for both crude data and stratified data.
In addition to being used to analyze data by health researchers,
OpenEpi has been used as a training tool for teaching epidemiology to
students at: Emory University, University of Massachusetts, University
of Michigan, University of Minnesota, Morehouse College, Columbia
University, University of Wisconsin, San Jose State University,
University of Medicine and Dentistry of New Jersey, University of
Washington, and elsewhere. This includes campus-based and distance
learning courses. Because OpenEpi is easy to use, requires no
programming experience, and can be run on the internet, students can use
the program and focus on the interpretation of results. Users can run
the program in English, French, Spanish, Portuguese or Italian.
Comments and suggestions for improvements are welcomed and the
developers respond to user queries. The developers encourage others to
develop modules that could be added to OpenEpi and provide a developer’s
tool at the website. Planned future development include improvements
to existing modules, development of new modules, translation into other
languages, and add the ability to cut and paste data and/or read data
files.
Mathematical models can project how infectious diseases progress to show the likely outcome of an epidemic and help inform public health interventions. Models use basic assumptions or collected statistics along with mathematics to find parameters for various infectious diseases and use those parameters to calculate the effects of different interventions, like mass vaccination
programmes. The modelling can help decide which intervention/s to avoid
and which to trial, or can predict future growth patterns, etc.
History
The
modeling of infectious diseases is a tool that has been used to study
the mechanisms by which diseases spread, to predict the future course of
an outbreak and to evaluate strategies to control an epidemic.
The first scientist who systematically tried to quantify causes of death was John Graunt in his book Natural and Political Observations made upon the Bills of Mortality,
in 1662. The bills he studied were listings of numbers and causes of
deaths published weekly. Graunt's analysis of causes of death is
considered the beginning of the "theory of competing risks" which
according to Daley and Gani is "a theory that is now well established among modern epidemiologists".
The earliest account of mathematical modelling of spread of disease was carried out in 1760 by Daniel Bernoulli. Trained as a physician, Bernoulli created a mathematical model to defend the practice of inoculating against smallpox. The calculations from this model showed that universal inoculation against smallpox would increase the life expectancy from 26 years 7 months to 29 years 9 months. Daniel Bernoulli's work preceded the modern understanding of germ theory.
The 1920s saw the emergence of compartmental models. The Kermack–McKendrick epidemic model (1927) and the Reed–Frost epidemic model (1928) both describe the relationship between susceptible, infected and immune
individuals in a population. The Kermack–McKendrick epidemic model was
successful in predicting the behavior of outbreaks very similar to that
observed in many recorded epidemics.
Assumptions
Models
are only as good as the assumptions on which they are based. If a model
makes predictions that are out of line with observed results and the
mathematics is correct, the initial assumptions must change to make the
model useful.
Rectangular and stationary age distribution, i.e., everybody in the population lives to age L and then dies, and for each age (up to L)
there is the same number of people in the population. This is often
well-justified for developed countries where there is a low infant
mortality and much of the population lives to the life expectancy.
Homogeneous mixing of the population, i.e., individuals of the population under scrutiny assort and make contact at random and do not mix mostly in a smaller subgroup. This assumption is rarely justified because social structure
is widespread. For example, most people in London only make contact
with other Londoners. Further, within London then there are smaller
subgroups, such as the Turkish community or teenagers (just to give two
examples), who mix with each other more than people outside their group.
However, homogeneous mixing is a standard assumption to make the
mathematics tractable.
Types of epidemic models
Stochastic
"Stochastic"
means being or having a random variable. A stochastic model is a tool
for estimating probability distributions of potential outcomes by
allowing for random variation in one or more inputs over time.
Stochastic models depend on the chance variations in risk of exposure,
disease and other illness dynamics.
Deterministic
When
dealing with large populations, as in the case of tuberculosis,
deterministic or compartmental mathematical models are often used. In a
deterministic model, individuals in the population are assigned to
different subgroups or compartments, each representing a specific stage
of the epidemic. Letters such as M, S, E, I, and R are often used to
represent different stages.
The transition rates from one class to another are mathematically
expressed as derivatives, hence the model is formulated using
differential equations. While building such models, it must be assumed
that the population size in a compartment is differentiable with respect
to time and that the epidemic process is deterministic. In other words,
the changes in population of a compartment can be calculated using only
the history that was used to develop the model.
Reproduction number
The basic reproduction number (denoted by R0)
is a measure of how transferable a disease is. It is the average number
of people that a single infectious person will infect over the course
of their infection. This quantity determines whether the infection will
spread exponentially, die out, or remain constant: if R0 > 1, then each person on average infects more than one other person so the disease will spread; if R0 < 1, then each person infects fewer than one person on average so the disease will die out; and if R0 = 1, then each person will infect on average exactly one other person, so the disease will become endemic: it will move throughout the population but not increase or decrease.
The basic reproduction number can be computed as a ratio of known
rates over time: if an infectious individual contacts β other people
per unit time, if all of those people are assumed to contract the
disease, and if the disease has a mean infectious period of 1/γ, then
the basic reproduction number is just R0 = β/γ.
Some diseases have multiple possible latency periods, in which case
the reproduction number for the disease overall is the sum of the
reproduction number for each transition time into the disease. For
example, Blower et al
model two forms of tuberculosis infection: in the fast case, the
symptoms show up immediately after exposure; in the slow case, the
symptoms develop years after the initial exposure (endogenous
reactivation). The overall reproduction number is the sum of the two
forms of contraction: R0 = R0FAST + R0SLOW.
Endemic steady state
An infectious disease is said to be endemic
when it can be sustained in a population without the need for external
inputs. This means that, on average, each infected person is infecting exactly one other person (any more and the number of people infected will grow exponentially and there will be an epidemic, any less and the disease will die out). In mathematical terms, that is:
The basic reproduction number (R0)
of the disease, assuming everyone is susceptible, multiplied by the
proportion of the population that is actually susceptible (S)
must be one (since those who are not susceptible do not feature in our
calculations as they cannot contract the disease). Notice that this
relation means that for a disease to be in the endemicsteady state,
the higher the basic reproduction number, the lower the proportion of
the population susceptible must be, and vice versa. This expression has
limitations concerning the susceptibility proportion, e.g. the R0 equals 0.5 implicates S has to be 2, however this proportion exceeds to population size.
Assume the rectangular stationary age distribution and let also
the ages of infection have the same distribution for each birth year.
Let the average age of infection be A, for instance when individuals younger than A are susceptible and those older than A
are immune (or infectious). Then it can be shown by an easy argument
that the proportion of the population that is susceptible is given by:
We reiterate that L is the age at which in this model every
individual is assumed to die. But the mathematical definition of the
endemic steady state can be rearranged to give:
This allows for the basic reproduction number of a disease given A and L in either type of population distribution.
Modelling epidemics
The SIR model is one of the more basic models used for modelling epidemics. There are many modifications to the model.
The SIR model
Diagram of the SIR model with initial values , and rates for infection and for recovery
Animation of the SIR model with initial values , initial rate for infection and constant rate for recovery .
If there is neither medicine nor vaccination available, it is only
possible to reduce the infection rate (often referred to as "flattening the curve")
by appropriate measures (e. g. "social distancing"). This animation
shows the impact of reducing the infection rate by 76 % (from down to ).
In 1927, W. O. Kermack and A. G. McKendrick created a model in which
they considered a fixed population with only three compartments:
susceptible, ; infected, ; and recovered, . The compartments used for this model consist of three classes:
is used to represent the individuals not yet infected with the disease
at time t, or those susceptible to the disease of the population.
denotes the individuals of the population who have been infected with
the disease and are capable of spreading the disease to those in the
susceptible category.
is the compartment used for the individuals of the population who have
been infected and then removed from the disease, either due to
immunization or due to death. Those in this category are not able to be
infected again or to transmit the infection to others.
The flow of this model may be considered as follows:
Using a fixed population,
in the three functions resolves that the value N should remain constant
within the simulation. The model is started with values of S(t=0),
I(t=0) and R(t=0). These are the number of people in the susceptible,
infected and removed categories at time equals zero. Subsequently, the
flow model updates the three variables for every time point with set
values for and .
The simulation first updates the infected from the susceptible and then
the removed category is updated from the infected category for the next
time point (t=1). This describes the flow persons between the three
categories. During an epidemic the susceptible category is not shifted
with this model, changes over the course of the epidemic and so does . These variables determine the length of the epidemic and would have to be updated with each cycle.
Several assumptions were made in the formulation of these equations:
First, an individual in the population must be considered as having an
equal probability as every other individual of contracting the disease
with a rate of and an equal number of people that an individual makes contact with per unit time. Then, let be the multiplication of and . This is the transmission probability times the contact rate. Besides, an infected individual makes contact with persons per unit time whereas only a fraction, of them are susceptible.Thus, we have every infective can infect --- susceptible persons,and therefore, the whole number of susceptibles infected by infectives per unit time is .
For the second and third equations, consider the population leaving the
susceptible class as equal to the number entering the infected class.
However, a number equal to the fraction ( which represents the mean recovery/death rate, or
the mean infective period) of infectives are leaving this class per
unit time to enter the removed class. These processes which occur
simultaneously are referred to as the Law of Mass Action, a widely
accepted idea that the rate of contact between two groups in a
population is proportional to the size of each of the groups concerned.
Finally, it is assumed that the rate of infection and recovery is much
faster than the time scale of births and deaths and therefore, these
factors are ignored in this model.
Steady-state solutions
The expected duration of susceptibility will be where reflects the time alive (life expectancy) and reflects the time in the susceptible state before becoming infected, which can be simplified to:
such that the number of susceptible persons is the number entering the susceptible compartment times the duration of susceptibility:
Analogously, the steady-state number of infected persons is the
number entering the infected state from the susceptible state (number
susceptible, times rate of infection times the duration of infectiousness :
Other compartmental models
There
are many modifications of the SIR model, including those that include
births and deaths, where upon recovery there is no immunity (SIS model),
where immunity lasts only for a short period of time (SIRS), where
there is a latent period of the disease where the person is not
infectious (SEIS and SEIR), and where infants can be born with immunity (MSIR).
Infectious disease dynamics
Mathematical models need to integrate the increasing volume of data being generated on host-pathogen interactions. Many theoretical studies of the population dynamics, structure and evolution of infectious diseases of plants and animals, including humans, are concerned with this problem.
If the proportion of the population that is immune exceeds the herd immunity
level for the disease, then the disease can no longer persist in the
population. Thus, if this level can be exceeded by vaccination, the
disease can be eliminated. An example of this being successfully
achieved worldwide is the global smallpox eradication, with the last wild case in 1977. The WHO is carrying out a similar vaccination campaign to eradicate polio.
The herd immunity level will be denoted q. Recall that, for a stable state:
In turn,
which is approximately:
S will be (1 − q), since q is the proportion of the population that is immune and q + S must equal one (since in this simplified model, everyone is either susceptible or immune). Then:
Remember that this is the threshold level. If the proportion of immune individuals exceeds this level due to a mass vaccination programme, the disease will die out.
We have just calculated the critical immunisation threshold (denoted qc).
It is the minimum proportion of the population that must be immunised
at birth (or close to birth) in order for the infection to die out in
the population.
Because the fraction of the final size of the population p that is never infected can be defined as:
Hence,
Solving for , we obtain:
When mass vaccination cannot exceed the herd immunity
If the vaccine used is insufficiently effective or the required coverage cannot be reached (for example due to popular resistance), the programme may fail to exceed qc. Such a programme can, however, disturb the balance of the infection without eliminating it, often causing unforeseen problems.
Suppose that a proportion of the population q (where q < qc) is immunised at birth against an infection with R0 > 1. The vaccination programme changes R0 to Rq where
This change occurs simply because there are now fewer susceptibles in the population who can be infected. Rq is simply R0 minus those that would normally be infected but that cannot be now since they are immune.
As a consequence of this lower basic reproduction number, the average age of infection A will also change to some new value Aq in those who have been left unvaccinated.
Recall the relation that linked R0, A and L. Assuming that life expectancy has not changed, now:
But R0 = L/A so:
Thus the vaccination programme will raise the average age of
infection, another mathematical justification for a result that might
have been intuitively obvious. Unvaccinated individuals now experience a
reduced force of infection due to the presence of the vaccinated group.
However, it is important to consider this effect when vaccinating
against diseases that are more severe in older people. A vaccination
programme against such a disease that does not exceed qc
may cause more deaths and complications than there were before the
programme was brought into force as individuals will be catching the
disease later in life. These unforeseen outcomes of a vaccination
programme are called perverse effects.
When mass vaccination exceeds the herd immunity
If
a vaccination programme causes the proportion of immune individuals in a
population to exceed the critical threshold for a significant length of
time, transmission of the infectious disease in that population will
stop. This is known as elimination of the infection and is different
from eradication.
Elimination
Interruption of endemic transmission of an infectious disease, which
occurs if each infected individual infects less than one other, is
achieved by maintaining vaccination coverage to keep the proportion of
immune individuals above the critical immunisation threshold.
Eradication
Reduction of infective organisms in the wild worldwide to zero. So far, this has only been achieved for smallpox and rinderpest. To get to eradication, elimination in all world regions must be achieved.