diff --git a/episodes/delays-access.Rmd b/episodes/delays-access.Rmd index 6b804add..e5601668 100644 --- a/episodes/delays-access.Rmd +++ b/episodes/delays-access.Rmd @@ -46,7 +46,7 @@ Look at the [glossary](../learners/reference.md) for the definitions of all the ::::::::::::::::::::::::: -However, early in an epidemic, efforts to understand the epidemic and implications for control can be delayed by the lack of an easy way to access key parameters for the disease of interest ([Nash et al., 2023](https://mrc-ide.github.io/epireview/)). Projects like `{epiparameter}` and `{epireview}` are building online catalogues following literature synthesis protocols that can help inform analysis and parametrise models by providing a library of previously estimated epidemiological parameters from past outbreaks. +However, early in an epidemic, efforts to understand the epidemic and implications for control can be delayed by the lack of an easy way to access key parameters for the disease of interest ([Nash et al., 2023](https://mrc-ide.github.io/epireview/)). Projects like `{epiparameter}` and [epireview](https://mrc-ide.github.io/epireview/) are building online catalogues following literature synthesis protocols that can help inform analysis and parametrise models by providing a library of previously estimated epidemiological parameters from past outbreaks. :::::::::::::::::: instructor @@ -58,7 +58,7 @@ To illustrate how to use the `{epiparameter}` R package in your analysis pipelin -Let's start loading the `{epiparameter}` package. We'll use the pipe `%>%` to connect some of its functions, some `{tibble}` and `{dplyr}` functions, so let's also call to the `{tidyverse}` package: +Let's start by loading the `{epiparameter}` package. We'll use the pipe `%>%` to connect some of its functions, some `{tibble}` and `{dplyr}` functions, so let's also load the `{tidyverse}` package: ```{r,warning=FALSE,message=FALSE} library(epiparameter) @@ -80,7 +80,7 @@ This helps us remember package functions and avoid namespace conflicts. ## The problem -If we want to estimate the transmissibility of an infection, it's common to use a package such as `{EpiEstim}` or `{EpiNow2}`. The `{EpiEstim}` package allows real-time estimation of the reproduction number using case data over time, reflecting how transmission changes based on when symptoms appear. For estimating transmission based on when people were actually infected (rather than symptom onset), the `{EpiNow2}` package extends this idea by combining it with a model that accounts for delays in observed data. Both packages require some epidemiological information as an input. For example, in `{EpiNow2}` we use `EpiNow2::Gamma()` to specify a [generation time](../learners/reference.md#generationtime) as a probability distribution adding its `mean`, standard deviation (`sd`), and maximum value (`max`). +If we want to estimate the transmissibility of an infection, it's common to use a package such as `{EpiEstim}` or `{EpiNow2}`. The `{EpiEstim}` package allows real-time estimation of the reproduction number using case data over time, reflecting how transmission changes based on when symptoms appear. For estimating transmission based on when people were actually infected (rather than symptom onset), the `{EpiNow2}` package extends this idea by combining it with a model that accounts for delays in observed data. Both packages require some epidemiological information as an input. For example, in `{EpiNow2}` we use `EpiNow2::Gamma()` to specify a [generation time](../learners/reference.md#generationtime) as a Gamma probability distribution adding its `mean`, standard deviation (`sd`), and maximum value (`max`). To specify a `generation_time` that follows a _Gamma_ distribution with mean $\mu = 4$, standard deviation $\sigma = 2$, and a maximum value of 20, we write: @@ -95,7 +95,7 @@ generation_time <- It is a common practice for analysts to manually search the available literature and copy and paste the **summary statistics** or the **distribution parameters** from scientific publications. A challenge that is often faced is that the reporting of different statistical distributions is not consistent across the literature (e.g. a paper may only report the mean, rather than the full underlying distribution). `{epiparameter}`'s objective is to facilitate the access to reliable estimates of distribution parameters for a range of infectious diseases, so that they can easily be implemented in outbreak analytic pipelines. -In this episode, we will *access* the summary statistics of generation time for COVID-19 from the library of epidemiological parameters provided by `{epiparameter}`. These metrics can be used to estimate the transmissibility of this disease using `{EpiNow2}` in subsequent episodes. +In this episode, we will *access* the summary statistics of a generation time for COVID-19 from the library of epidemiological parameters provided by `{epiparameter}`. These metrics can be used to estimate the transmissibility of this disease using `{EpiNow2}` in subsequent episodes. Let's start by looking at how many entries are currently available in the **epidemiological distributions database** in `{epiparameter}` using `epiparameter_db()` for the epidemiological distribution `epi_name` called generation time with the string `"generation"`: @@ -111,7 +111,7 @@ In the library of epidemiological parameters, we may not have a `"generation"` t ### Systematic review data for priority pathogens -The [`{epireview}` R package](https://mrc-ide.github.io/epireview/) has parameters on Ebola, Marburg and Lassa from recent systematic reviews, with more priority pathogens planned for future releases. Have a look at [this vignette](https://epiverse-trace.github.io/epiparameter/articles/data_from_epireview.html) for more information on how to use these parameters with `{epiparameter}`. +The [`{epireview}` R package](https://mrc-ide.github.io/epireview/) has parameters on Ebola, Marburg, Lassa, SARS, Zika and Nipah from recent systematic reviews, with more priority pathogens planned for future releases. Have a look at [this vignette](https://epiverse-trace.github.io/epiparameter/articles/data_from_epireview.html) for more information on how to use these parameters with `{epiparameter}`. ::::::::::::::::::::::::: @@ -208,7 +208,7 @@ The objective of the assessment above is to assess the interpretation of a large In this section, we will use `{epiparameter}` to obtain the serial interval for COVID-19, as an alternative to the generation time. -First, let's see how many parameters we have in the epidemiological distributions database (`epiparameter_db()`) with the `disease` named `covid`-19. Run this code: +First, let's see how many parameters we have in the epidemiological distributions database (`epiparameter_db()`) with the `disease` named `covid`. Run this code: ```{r,eval=FALSE} epiparameter::epiparameter_db( @@ -225,13 +225,13 @@ epiparameter::epiparameter_db( ) ``` -With this query combination, we get more than one delay distribution (because the database has multiple entries). This output is an `` class object. +With this query combination, we get more than one delay distribution (because the database has multiple entries). When several entries match, the output is a list of `` objects. ::::::::::::::::: callout ### CASE-INSENSITIVE -`epiparameter_db` is [case-insensitive](https://dillionmegida.com/p/case-sensitivity-vs-case-insensitivity/#case-insensitivity). This means that you can use strings with letters in upper or lower case indistinctly. Strings like `"serial"`, `"serial interval"` or `"serial_interval"` are also valid. +`epiparameter_db` is [case-insensitive](https://dillionmegida.com/p/case-sensitivity-vs-case-insensitivity/#case-insensitivity). This means that you can use strings with letters in upper or lower case interchangeably. Strings like `"serial"`, `"serial interval"` or `"serial_interval"` are also valid. ::::::::::::::::::::::::: @@ -276,7 +276,7 @@ distribution %>% is_parameterised() ``` -### Parameterised entries have an Inference method +### Parameterised entries have an inference method As detailed in `?is_parameterised`, a parameterised distribution is the entry that has a probability distribution associated with it provided by an `inference_method` as shown in `metadata`: @@ -299,9 +299,9 @@ Take 2 minutes to explore the `{epiparameter}` library. Find: -- How many delay distributions are for that disease? +- How many delay distributions are there for that disease? -- How many types of probability distribution (e.g., gamma, log normal) are for a given delay in that disease? +- How many types of probability distribution (e.g., gamma, log normal) are there for a given delay in that disease? Ask: @@ -334,15 +334,15 @@ The combo of `epiparameter_db()` plus `parameter_tbl()` gets a data frame of all We choose to explore Ebola's delay distributions: ```{r} -# we expect 16 delay distributions for Ebola +# we expect 17 delay distributions for Ebola epiparameter::epiparameter_db( disease = "ebola" ) ``` -Now, from the output of `epiparameter::epiparameter_db()`, What is an [offspring distribution](../learners/reference.md#offspringdist)? +Now, from the output of `epiparameter::epiparameter_db()`, what is an [offspring distribution](../learners/reference.md#offspringdist)? -We choose to find Ebola's incubation periods. This output lists all the papers and parameters found. Run this locally if needed: +We choose to search for incubation period estimates for Ebola. This output lists all the papers and parameters found. Run this locally if needed: ```{r, eval=FALSE} epiparameter::epiparameter_db( @@ -374,7 +374,7 @@ How does `{epiparameter}` do the collection and review of peer-reviewed literatu ## Select a single distribution -The `epiparameter::epiparameter_db()` function works as a filtering or subset function. We can use the `author` argument to keep `Hiroshi Nishiura` parameters, or the `subset` argument to keep parameters from studies with a sample size higher than 10: +The `epiparameter::epiparameter_db()` function works as a filtering or subset function. We can use the `author` argument to keep parameters reported in papers where the first author is `Nishiura`, or the `subset` argument to keep parameters from studies with a sample size higher than 10: ```{r, eval = FALSE} epiparameter::epiparameter_db( @@ -445,7 +445,7 @@ This only seems to work in code chunks and R console! --> -You can use `plot()` to `` objects to visualise: +You can use `plot()` on `` objects to visualise: - the *Probability Density Function (PDF)* and - the *Cumulative Distribution Function (CDF)*. @@ -465,7 +465,7 @@ plot(covid_serialint, xlim = c(1, 60)) ## Extract the summary statistics -We can get the `mean` and standard deviation (`sd`) from this `` diving into the `summary_stats` object: +We can get the `mean` and standard deviation (`sd`) from this `` object by diving into the `summary_stats` element: ```{r} # get the mean @@ -483,7 +483,7 @@ generation_time <- ) ``` -In the next episode we'll learn how to use `{EpiNow2}` to correctly specify distributions, estimate transmissibility. Then, how to use **distribution functions** to get a maximum value (`max`) for `EpiNow2::LogNormal()` and use `{epiparameter}` in your analysis. +In the next episode we'll learn how to use `{EpiNow2}` to correctly specify distributions and estimate transmissibility. We'll also learn how to use **distribution functions** to get a maximum value (`max`) for `EpiNow2::LogNormal()` and use `{epiparameter}` in your analysis. :::::::::::::::::::::::::::::: callout @@ -500,7 +500,7 @@ covid_serialint_parameters This gets a vector of class `` ready to use as input for any other package! -Consider that {EpiNow2} functions also accept distribution parameters as inputs. Run `?EpiNow2::LogNormal` to read the [Probability distributions](https://epiforecasts.io/EpiNow2/reference/Distributions.html) reference manual. +Consider that `{EpiNow2}` functions also accept distribution parameters as inputs. Run `?EpiNow2::LogNormal` to read the [Probability distributions](https://epiforecasts.io/EpiNow2/reference/Distributions.html) reference manual. :::::::::::::::::::::::::::::: @@ -525,10 +525,10 @@ Answer: Use the `$` operator plus the tab or keyboard button to explore them as an expandable list: ```r -covid_serialint$ +ebola_serial$ ``` -Use the `str()` to display the structure of the `` R object. +Use `str()` to display the structure of the `` R object. :::::::::::::::::: @@ -575,7 +575,7 @@ An interesting element is the `method_assess` nested entry, which refers to the covid_serialint$method_assess ``` -We will explore these concepts following episodes! +We will explore these concepts in the following episodes! :::::::::::::::::::::::::::::: diff --git a/episodes/delays-functions.Rmd b/episodes/delays-functions.Rmd index 0dd4f043..356c134c 100644 --- a/episodes/delays-functions.Rmd +++ b/episodes/delays-functions.Rmd @@ -14,7 +14,7 @@ editor_options: ::::::::::::::::::::::::::::::::::::: objectives -- Use distribution functions to continuous and discrete distributions stored as `` objects. +- Use distribution functions with continuous and discrete distributions stored as `` objects. - Convert a continuous to a discrete distribution with `{epiparameter}`. - Connect `{epiparameter}` outputs with `{EpiNow2}` inputs. @@ -59,7 +59,7 @@ library(EpiNow2) library(tidyverse) ``` -To recap, we learned that `{epiparameter}` help us to *choose* one specific set of epidemiological parameters from the literature, instead of copy/pasting them *by hand*: +To recap, we learned that `{epiparameter}` helps us to *choose* one specific set of epidemiological parameters from the literature, instead of copy/pasting them *by hand*: ```{r,message=FALSE} covid_serialint <- @@ -144,7 +144,7 @@ epiparameter::generate(covid_serialint, times = 10) ::::::::: instructor -Access to the reference documentation (Help files) for these functions is accessible with the three double-colon notation: `epiparameter:::` +You can access the reference documentation (help files) for these functions with the triple-colon notation: `epiparameter:::` - `?epiparameter:::density.epiparameter()` - `?epiparameter:::cdf.epiparameter()` @@ -183,13 +183,13 @@ plot(covid_serialint) ```{r,eval=TRUE} # calculate probability of finding backward cases -# with contact traicing window of 2 days +# with contact tracing window of 2 days window_2 <- epiparameter::cdf(covid_serialint, q = 2) window_2 # calculate probability of finding backward cases -# with contact traicing window of 6 days +# with contact tracing window of 6 days window_6 <- epiparameter::cdf(covid_serialint, q = 6) window_6 @@ -214,7 +214,7 @@ Given the COVID-19 serial interval: If we exchange the question between days and cumulative probability to: -- When considering secondary cases, how many days following the symptom onset of primary cases can we expect 55% of symptom onset to occur? +- When considering secondary cases, within how many days following the symptom onset of primary cases can we expect 55% of secondary symptom onset to occur? ```{r,eval=FALSE} quantile(covid_serialint, p = 0.55) @@ -222,7 +222,7 @@ quantile(covid_serialint, p = 0.55) An interpretation could be: -- The 55% percent of the symptom onset of secondary cases will happen after 4.2 days after the symptom onset of primary cases. +- 55% of secondary cases will have their symptom onset within around 4.3 days of symptom onset in the primary case. :::::::::::::::::::::::::: @@ -260,7 +260,7 @@ While for a **continuous** distribution, we plot the *Probability Density Functi plot(covid_serialint_discrete) ``` -To finally get a `max` value, let's access the quantile value of the 99th percentile or `0.99` probability of the distribution with the `prob_dist$q` notation, similarly to how we access the `summary_stats` values. +To finally get a `max` value, let's access the quantile value of the 99th percentile or `0.99` cumulative probability of the distribution using the `quantile()` function, as we did above for the continuous distribution. ```{r} covid_serialint_discrete_max <- @@ -303,18 +303,18 @@ covid_incubation <- covid_incubation -# to get an integer as a response, discretize the distribution +# to get an integer as a response, discretise the distribution covid_incubation_discrete <- epiparameter::discretise(covid_incubation) covid_incubation_discrete -# calculate the quantile or value at the percertile 99th from the distribution +# calculate the quantile or value at the 99th percentile from the distribution quantile(covid_incubation_discrete, p = 0.99) ``` 99% of those who develop COVID-19 symptoms will do so within `r quantile(covid_incubation_discrete, p = 0.99)` days of infection. -Now, Is this result consistent with the duration of quarantine recommended in practice during the COVID-19 pandemic? +Now, is this result consistent with the duration of quarantine recommended in practice during the COVID-19 pandemic? :::::::::::::::::::::::::: @@ -405,11 +405,11 @@ The `{epiparameter}` will give you access to the parameter estimated from histor Follow these steps: - Get access to the `serial interval` distribution of `Ebola` using `{epiparameter}`. -- Access to one entry from the study with the highest sample size. +- Access one entry from the study with the highest sample size. - Print the output and read the description. -- Identify which probability distribution (e.g., Gamma, Lognormal, etc.) was used by authors to model the delay. +- Identify which probability distribution (e.g., Gamma, Lognormal, etc.) was used by the authors to model the delay. - Extract the distribution parameters for the corresponding probability distribution. -- Calculate an exact maximum value for the delay distribution (e.g., the 99th percentile). +- Calculate an exact maximum value for the delay distribution (e.g., the 99th percentile). - Plug-in the distribution parameters from the `` to the corresponding probability distribution function in `{EpiNow2}` (e.g., `EpiNow2::Gamma()`, `EpiNow2::LogNormal()`, etc.). @@ -468,7 +468,7 @@ ebola_serialint_max <- ebola_serialint %>% ebola_serialint_max -# 4. adapt {epiparameter} to {EpiNow2} distribution inferfase +# 4. adapt {epiparameter} to the {EpiNow2} distribution interface ebola_generationtime <- EpiNow2::Gamma( shape = ebola_serialint_params["shape"], scale = ebola_serialint_params["scale"], @@ -484,12 +484,12 @@ plot(ebola_serialint) ``` ```{r} -# plot the `EpiNow2` class object +# plot the `dist_spec` class object from {EpiNow2} plot(ebola_generationtime) ``` -Plotting distributions from the `{EpiNow2}` interface always gives a discretized output. -From the legend: `PMF` stants for Probability Mass Function and `CMF` stants for Cummulative Mass Function. +Plotting distributions from the `{EpiNow2}` interface always gives a discretised output. +From the legend: `PMF` stands for Probability Mass Function and `CMF` stands for Cumulative Mass Function. :::::::::::::::::::: @@ -520,7 +520,7 @@ The `delays` argument and the `delay_opts()` helper function are analogous to th ```r epinow_estimates <- EpiNow2::epinow( # cases - reported_cases = example_confirmed[1:60], + data = example_confirmed[1:60], # delays generation_time = EpiNow2::generation_time_opts(covid_serial_interval), delays = EpiNow2::delay_opts(covid_incubation_time) @@ -740,7 +740,7 @@ incubation_period_ebola <- EpiNow2::Gamma( mean = ebola_incubation$summary_stats$mean, sd = ebola_incubation$summary_stats$sd, - max = quantile(ebola_serial_discrete, p = 0.99) + max = quantile(ebola_incubation_discrete, p = 0.99) ) # epinow ------------------------------------------------------------------ @@ -826,7 +826,7 @@ influenza_incubation <- single_epiparameter = TRUE ) -# Discretize incubation period +# Discretise incubation period influenza_incubation_discrete <- epiparameter::discretise(influenza_incubation) diff --git a/episodes/quantify-transmissibility.Rmd b/episodes/quantify-transmissibility.Rmd index cfca6376..64f5b2ab 100644 --- a/episodes/quantify-transmissibility.Rmd +++ b/episodes/quantify-transmissibility.Rmd @@ -6,7 +6,7 @@ exercises: 0 :::::::::::::::::::::::::::::::::::::: questions -- How can I estimate the time-varying reproduction number ($Rt$) and growth rate from a time series of case data? +- How can I estimate the time-varying reproduction number ($R_t$) and growth rate from a time series of case data? - How can I quantify geographical heterogeneity from these transmission metrics? :::::::::::::::::::::::::::::::::::::::::::::::: @@ -27,7 +27,7 @@ Learners should familiarise themselves with following concepts before working th **Epidemic theory**: Effective reproduction number. -**Data science**: Data transformation and visualization. You can review the episode on [Aggregate and visualize](https://epiverse-trace.github.io/tutorials-early/describe-cases.html) incidence data. +**Data science**: Data transformation and visualisation. You can review the episode on [Aggregate and visualise](https://epiverse-trace.github.io/tutorials-early/describe-cases.html) incidence data. ::::::::::::::::::::::::::::::::: @@ -51,7 +51,7 @@ To estimate these key metrics using case data we must account for delays between In the next tutorials we will focus on how to use the functions in `{EpiNow2}` to estimate transmission metrics of case data. We will not cover the theoretical background of the models or inference framework, for details on these concepts see the [vignette](https://epiforecasts.io/EpiNow2/dev/articles/estimate_infections.html). -In this tutorial we are going to learn how to use the `{EpiNow2}` package to estimate the time-varying reproduction number. We'll get input data from `{incidence2}`. We'll use the `{tidyr}` and `{dplyr}` packages to arrange some of its outputs, `{ggplot2}` to visualize case distribution, and the pipe `%>%` to connect some of their functions, so let's also call to the `{tidyverse}` package: +In this tutorial we are going to learn how to use the `{EpiNow2}` package to estimate the time-varying reproduction number. We'll get input data from `{incidence2}`. We'll use the `{tidyr}` and `{dplyr}` packages to arrange some of its outputs, `{ggplot2}` to visualise case distribution, and the pipe `%>%` to connect some of their functions, so let's also load the `{tidyverse}` package: ```r library(EpiNow2) @@ -114,7 +114,7 @@ dplyr::as_tibble(incidence2::covidregionaldataUK) To use the data, we must format the data to have two columns: -+ `date`: the date (as a date object see `?is.Date()`), ++ `date`: the date (as a date object, see `?is.Date()`), + `confirm`: number of disease reports (confirm) on that date. Let's use `{tidyr}` and `{incidence2}` for this: @@ -136,8 +136,8 @@ cases_sliced <- incidence2::covidregionaldataUK %>% dplyr::slice_head(n = 90) ``` -With `incidence2::incidence()` we aggregate cases in different time *intervals* (i.e., days, weeks or months) or per *group* categories. Also we can have complete dates for all the range of dates per group category using `complete_dates = TRUE` -Explore later the [`incidence2::incidence()` reference manual](https://www.reconverse.org/incidence2/reference/incidence.html) +With `incidence2::incidence()` we aggregate cases in different time *intervals* (i.e., days, weeks or months) or per *group* categories. Also we can have complete dates for all the range of dates per group category using `complete_dates = TRUE`. +Explore later the [`incidence2::incidence()` reference manual](https://www.reconverse.org/incidence2/manual.html#sec:man-incidence). ::::::::::::::::::::::::: spoiler @@ -178,10 +178,9 @@ cases We assume there are delays from the time of infection until the time a case is reported. We specify these delays as distributions to account for the uncertainty in individual level differences. The delay may involve multiple types of processes. A typical delay from time of infection to case reporting may consist of: -> **time from infection to symptom onset** (the [incubation period](../learners/reference.md#incubation)) + **time from symptom onset to case notification** (the reporting time) -. +> **time from infection to symptom onset** (the [incubation period](../learners/reference.md#incubation)) + **time from symptom onset to case notification** (the reporting time). -The delay distribution for each of these processes can either estimated from data or obtained from the literature. We can express uncertainty about what the correct parameters of the distributions by assuming the distributions have **fixed** parameters or whether they have **variable** parameters. To understand the difference between **fixed** and **variable** distributions, let's consider the incubation period. +The delay distribution for each of these processes can either be estimated from data or obtained from the literature. We can express uncertainty about the correct parameters of the distributions by assuming the distributions have either **fixed** parameters or **variable** parameters. To understand the difference between **fixed** and **variable** distributions, let's consider the incubation period. ::::::::::::::::::::::::::::::::::::: callout @@ -205,7 +204,7 @@ The number of delays and type of delay are a flexible input that depend on the d #### Incubation period distribution -The distribution of incubation period for many diseases can usually be obtained from the literature. The package `{epiparameter}` contains a library of epidemiological parameters for different diseases obtained from the literature. +The incubation period distribution for many diseases can usually be obtained from the literature. The package `{epiparameter}` contains a library of epidemiological parameters for different diseases obtained from the literature. We specify a (fixed) gamma distribution with mean $\mu = 4$ and standard deviation $\sigma = 2$ (shape = $4$, scale = $1$) using the function `Gamma()` as follows: @@ -243,11 +242,11 @@ For all types of delay, we will need to use distributions for positive values on #### Including distribution uncertainty -To specify a **variable** distribution, we include uncertainty around the mean $\mu$ and standard deviation $\sigma$ of our gamma distribution. If our incubation period distribution has a mean $\mu$ and standard deviation $\sigma$, then we assume the mean ($\mu$) follows a Normal distribution with standard deviation $\sigma_{\mu}$: +To specify a **variable** distribution, we include uncertainty around the mean $\mu$ and standard deviation $\sigma$ of our gamma distribution. If our incubation period distribution has a mean $\mu$ and standard deviation $\sigma$, then we can assume the mean ($\mu$) follows a Normal distribution with standard deviation $\sigma_{\mu}$: $$\mbox{Normal}(\mu,\sigma_{\mu}^2)$$ -and a standard deviation ($\sigma$) follows a Normal distribution with standard deviation $\sigma_{\sigma}$: +and the standard deviation ($\sigma$) follows a Normal distribution with standard deviation $\sigma_{\sigma}$: $$\mbox{Normal}(\sigma,\sigma_{\sigma}^2).$$ @@ -275,7 +274,7 @@ After the incubation period, there will be an additional delay of time from symp When specifying a distribution, it is useful to visualise the probability density to see the peak and spread of the distribution, in this case we will use a *log normal* distribution. -If we want to assume that the mean reporting delay is 2 days (with a uncertainty of 0.5 days) and a standard deviation of 1 day (with uncertainty of 0.5 days), we can specify a variable distribution using `LogNormal()` as before: +If we want to assume that the mean reporting delay is 2 days (with an uncertainty of 0.5 days) and a standard deviation of 1 day (with uncertainty of 0.5 days), we can specify a variable distribution using `LogNormal()` as before: ```{r,warning=FALSE,message=FALSE} reporting_delay_variable <- EpiNow2::LogNormal( @@ -287,7 +286,7 @@ reporting_delay_variable <- EpiNow2::LogNormal( :::::::::::::::::::::: spoiler -**Visualize a log Normal distribution using {epiparameter}** +**Visualise a log normal distribution using {epiparameter}** Using `epiparameter::epiparameter()` we can create a custom distribution. The fixed log normal distribution will look like: @@ -319,7 +318,7 @@ epiparameter::epiparameter( If data is available on the time between symptom onset and reporting, we can use the function `EpiNow2::estimate_delay()` to estimate a log normal distribution from a vector of delays. -The code below illustrates how to use `EpiNow2::estimate_delay()` with synthetic Ebola data. We calculate the reporting delay for each case the time difference **from** date of onset **to** date of hospitalization (date when case was reported). +The code below illustrates how to use `EpiNow2::estimate_delay()` with synthetic Ebola data. We calculate the reporting delay for each case as the time difference **from** date of onset **to** date of hospitalisation (date when case was reported). ```{r, eval = TRUE, message = FALSE, warning = FALSE} library(tidyverse) @@ -360,10 +359,10 @@ generation_time_variable <- EpiNow2::LogNormal( ## Finding estimates -The function `epinow()` is a wrapper for the function `estimate_infections()` used to estimate cases by date of infection. The generation time distribution and delay distributions must be passed using the functions ` generation_time_opts()` and `delay_opts()` respectively. +The function `epinow()` is a wrapper for the function `estimate_infections()` used to estimate cases by date of infection. The generation time distribution and delay distributions must be passed using the functions `generation_time_opts()` and `delay_opts()` respectively. There are numerous other inputs that can be passed to `epinow()`, see `?EpiNow2::epinow()` for more detail. -One optional input is to specify a *log normal* prior for the effective reproduction number $R_t$ at the start of the outbreak. We specify a mean of 2 days and standard deviation of 2 days as arguments of `prior` within `rt_opts()`: +One optional input is to specify a *log normal* prior for the effective reproduction number $R_t$ at the start of the outbreak. We specify a mean of 2 and standard deviation of 2 as arguments of `prior` within `rt_opts()`: ```{r, eval = TRUE} # define Rt prior distribution @@ -388,7 +387,7 @@ To find the maximum number of available cores on your machine, use `parallel::de ::::::::::::::::::::::::: checklist -**Note:** In the code below `_fixed` distributions are used instead of `_variable` (delay distributions with uncertainty). This is to speed up computation time. It is generally recommended to use variable distributions that account for additional uncertainty. +**Note:** In the code below `*_fixed` distributions are used instead of `*_variable` (delay distributions with uncertainty). This is to speed up computation time. It is generally recommended to use variable distributions that account for additional uncertainty. ```{r, echo = TRUE} # fixed alternatives @@ -447,7 +446,7 @@ For the purpose of this tutorial, we can optionally use `EpiNow2::stan_opts()` t # you can add the `stan` argument EpiNow2::epinow( ..., - stan = EpiNow2::stan_opts(samples = 1000, chains = 3) + stan = EpiNow2::stan_opts(samples = 1000, chains = 2) ) ``` @@ -463,7 +462,7 @@ We can extract and visualise estimates of the effective reproduction number thro plot(estimates, type = "R") ``` -The uncertainty in the estimates increases through time. This is because estimates are informed by data in the past - within the delay periods. This difference in uncertainty is categorised into **Estimate** (green) utilises all data and **Estimate based on partial data** (orange) estimates that are based on less data (because infections that happened at the time are more likely to not have been observed yet) and therefore have increasingly wider intervals towards the date of the last data point. Finally, the **Forecast** (purple) is a projection ahead of time. +The uncertainty in the estimates increases through time. This is because estimates are informed by data in the past - within the delay periods. The plot distinguishes three categories: **Estimate** (green) uses all available data; **Estimate based on partial data** (orange) is based on less data (because infections around that time are more likely not to have been observed yet) and therefore has increasingly wider intervals towards the date of the last data point; and **Forecast** (purple) is a projection ahead of time. We can also visualise the growth rate estimate through time: ```{r} diff --git a/episodes/severity-static.Rmd b/episodes/severity-static.Rmd index 0ce2edad..7eaefd45 100644 --- a/episodes/severity-static.Rmd +++ b/episodes/severity-static.Rmd @@ -40,7 +40,7 @@ This episode requires you to be familiar with: :::::::::: spoiler -Install packages if their are not already installed +Install packages if they are not already installed ```{r, eval = FALSE} # if {pak} is not installed, run: install.packages("pak") @@ -67,7 +67,7 @@ We can assess the pandemic potential of an epidemic with two critical measuremen ([Fraser et al., 2009](https://www.science.org/doi/full/10.1126/science.1176062), [CDC, 2016](https://www.cdc.gov/flu/pandemic-resources/national-strategy/severity-assessment-framework-508.html)). -![HHS Pandemic Planning Scenarios based on the Pandemic Severity Assessment Framework. This uses a combined measure of clinical severity and transmissibility to characterise influenza pandemic scenarios. **HHS**: United States Department of Health and Human Services ([CDC, 2016](https://www.cdc.gov/flu/pandemic-resources/national-strategy/severity-assessment-framework-508.html)).](fig/cfr-hhs-scenarios-psaf.png){alt='The horizontal axis is the scaled measure of clinical severity, ranging from 1 to 7, where 1 is low, 4 is moderate, and 7 is very severe. The vertical axis is the scaled measure of transmissibility, ranging from 1 to 5, where 1 is low, 3 is moderate, and 5 is highly transmissible. On the graph, HHS pandemic planning scenarios are labeled across four quadrants (A, B, C and D). From left to right, the scenarios are “seasonal range,” “moderate pandemic,” “severe pandemic” and “very severe pandemic.” As clinical severity increases along the horizontal axis, or as transmissibility increases along the vertical axis, the severity of the pandemic planning scenario also increases.'} +![HHS Pandemic Planning Scenarios based on the Pandemic Severity Assessment Framework. This uses a combined measure of clinical severity and transmissibility to characterise influenza pandemic scenarios. **HHS**: United States Department of Health and Human Services ([CDC, 2016](https://www.cdc.gov/flu/pandemic-resources/national-strategy/severity-assessment-framework-508.html)).](fig/cfr-hhs-scenarios-psaf.png){alt='The horizontal axis is the scaled measure of clinical severity, ranging from 1 to 7, where 1 is low, 4 is moderate, and 7 is very severe. The vertical axis is the scaled measure of transmissibility, ranging from 1 to 5, where 1 is low, 3 is moderate, and 5 is highly transmissible. On the graph, HHS pandemic planning scenarios are labelled across four quadrants (A, B, C and D). From left to right, the scenarios are “seasonal range,” “moderate pandemic,” “severe pandemic” and “very severe pandemic.” As clinical severity increases along the horizontal axis, or as transmissibility increases along the vertical axis, the severity of the pandemic planning scenario also increases.'} One epidemiological approach to estimating the clinical severity is quantifying the Case Fatality Risk (CFR). CFR is the conditional probability of death given confirmed diagnosis, calculated as the ratio of the cumulative number of deaths from an infectious disease to the number of confirmed diagnosed cases. However, calculating this directly during the course of an epidemic tends to result in a naive or biased CFR given the time [delay](../learners/reference.md#delaydist) from onset to death, varying substantially as the epidemic progresses and stabilising at the later stages of the outbreak ([Ghani et al., 2005](https://academic.oup.com/aje/article/162/5/479/82647?login=false#620743)). @@ -89,7 +89,7 @@ which could be intrinsic to the infectious agent (e.g., a new, more severe strai In this tutorial we are going to learn how to use the `{cfr}` package to calculate and adjust a CFR estimation using [delay distributions](../learners/reference.md#delaydist) from `{epiparameter}` or elsewhere, based on the methods developed by [Nishiura et al., 2009](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0006852), also, how we can reuse `{cfr}` functions for more severity measurements. -We'll use the pipe `%>%` operator to connect functions, so let's also call to the `{tidyverse}` package: +We'll use the pipe `%>%` operator to connect functions, so let's also load the `{tidyverse}` package: ```{r,message=FALSE,warning=FALSE} library(cfr) @@ -129,7 +129,7 @@ Answer to these questions: What data sources can we use to estimate the clinical severity of a disease outbreak? [Verity et al., 2020](https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(20)30243-7/fulltext) summarises the spectrum of COVID-19 cases: -![Spectrum of COVID-19 cases. The CFR aims to estimate the proportion of deaths among confirmed cases in an epidemic. +![Spectrum of COVID-19 cases. The CFR aims to estimate the proportion of deaths among confirmed cases in an epidemic ([Verity et al., 2020](https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(20)30243-7/fulltext#gr1))](fig/cfr-spectrum-cases-covid19.jpg) - At the top of the pyramid, those who met the WHO case criteria for **severe** or critical cases would likely have been identified in the hospital setting, presenting with atypical viral pneumonia. These cases would have been identified in mainland China and among those categorised internationally as local transmission. @@ -150,7 +150,7 @@ To calculate the naive CFR, the `{cfr}` package requires an input data frame wit - `cases` - `deaths` -Let's explore the `ebola1976` dataset, included in {cfr}, which comes from the first Ebola outbreak in what was then called Zaire (now the Democratic Republic of the Congo) in 1976, as analysed by Camacho et al. (2014). +Let's explore the `ebola1976` dataset, included in `{cfr}`, which comes from the first Ebola outbreak in what was then called Zaire (now the Democratic Republic of the Congo) in 1976, as analysed by [Camacho et al. (2014)](https://doi.org/10.1016/j.epidem.2014.09.003). ```{r} # Load the Ebola 1976 data provided with the {cfr} package @@ -222,14 +222,14 @@ Estimate the naive CFR. Inspect the format of the data input. - Does it contain daily data? -- Does the column names are as required by `cfr_static()`? -- How would you rename column names from a data frame? +- Do the column names match those required by `cfr_static()`? +- How would you rename the columns in a data frame? :::::::::::::::::::: :::::::::::::::::::: solution -We read the data input using `readr::read_csv()`. This function recognize that the column `date` is a `` class vector. +We read the data input using `readr::read_csv()`. This function recognises that the column `date` is a `` class vector. ```{r, eval=TRUE,echo=FALSE,warning=FALSE,message=FALSE} # read data from the tutorial repository R project @@ -252,7 +252,7 @@ sarscov2_input We can use `dplyr::rename()` to adapt the external data to fit the data input for `cfr_static()`. ```{r} -# Rename before Estimate naive CFR +# Rename columns before estimating the naive CFR sarscov2_input %>% dplyr::rename( cases = cases_jpn, @@ -303,7 +303,7 @@ In this tutorial episode, we are going to focus on solutions to deal with this s Improving an _early_ epidemiological assessment of a delay-adjusted CFR is crucial for determining virulence, shaping the level and choices of public health intervention, and providing advice to the general public. -In 2009, during the swine-flu virus, Influenza A (H1N1), Mexico had an early biased estimation of the CFR. Initial reports from the government of Mexico suggested a virulent infection, whereas, in other countries, the same virus was perceived as mild ([TIME, 2009](https://content.time.com/time/health/article/0,8599,1894534,00.html)). +In 2009, during the swine-flu outbreak caused by Influenza A (H1N1), Mexico had an early biased estimation of the CFR. Initial reports from the government of Mexico suggested a virulent infection, whereas, in other countries, the same virus was perceived as mild ([TIME, 2009](https://content.time.com/time/health/article/0,8599,1894534,00.html)). In the USA and Canada, no deaths were attributed to the virus in the first ten days following the World Health Organization's declaration of a public health emergency. Even under similar circumstances at the early stage of the global pandemic, public health officials, policymakers and the general public want to know the virulence of an emerging infectious agent. @@ -323,7 +323,7 @@ We can showcase this last bias using the [concept described in this `{cfr}` vign [Nishiura et al., 2009](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0006852) developed a method that considers the **time delay** from the onset of symptoms to death. -Real-time outbreaks may have a number of deaths that are insufficient to determine the time distribution between onset and death. Therefore, we can estimate the _distribution delay_ from historical outbreaks or reuse the ones accessible via R packages like `{epiparameter}` or `{epireview}`, which collect them from published scientific literature. For a step-by-step guide, read the tutorial episode on how to [access epidemiological delays](../episodes/delays-access.md). +Real-time outbreaks may have a number of deaths that are insufficient to determine the time distribution between onset and death. Therefore, we can estimate the _distribution delay_ from historical outbreaks or reuse the ones accessible via R packages like `{epiparameter}` or [epireview](https://github.com/mrc-ide/epireview/), which collect them from published scientific literature. For a step-by-step guide, read the tutorial episode on how to [access epidemiological delays](../episodes/delays-access.md). Let's use `{epiparameter}`: @@ -363,7 +363,7 @@ out_low <- out_delay_adjusted %>% pull(severity_low) out_high <- out_delay_adjusted %>% pull(severity_high) ``` -The delay-adjusted CFR indicated that the overall disease severity _at the end of the outbreak_ or with the _latest data available at the moment_ is `r out_estimate` with a 95% confidence interval between `r out_low` and `r out_high`, slightly higher than the naive one. +The delay-adjusted CFR estimates the overall disease severity, given the _latest data available at the moment_, as `r out_estimate` with a 95% confidence interval between `r out_low` and `r out_high` — substantially higher than the naive estimate. This larger value reflects the correction for deaths that are still to be reported among the cases observed in the first 30 days. ::::::::::::::::::::::::::: discussion @@ -382,13 +382,13 @@ in the "More Resources" section of this tutorial's website. To correct the bias arising from cases whose outcomes are not yet known at the time of estimation, `{cfr}` accounts for the probability that a case’s outcome becomes known after a certain delay. -It does so by relating $D_t$ to the incidence function $c_t$ (i.e., the number of new confirmed cases on day t) and the conditional probability density function $f_s$ of the time from onset to death, given death. +It does so by relating $D_t$ to the incidence function $c_t$ (i.e., the number of new confirmed cases on day t) and the conditional probability density function $f_s$ of the time from onset to death, given death. $$ D_t = p_t \times \sum_{i = 0}^t\sum_{j = 0}^\infty c_i f_{j - i} $$ -Here, $D_t$ is the cumulative number of deaths up to time t, and $p_t$ is the realized proportion of confirmed cases that die from the infection (i.e., the unbiased case fatality risk, or CFR). +Here, $D_t$ is the cumulative number of deaths up to time t, and $p_t$ is the realised proportion of confirmed cases that die from the infection (i.e., the unbiased case fatality risk, or CFR). The term $\sum_{i = 0}^t\sum_{j = 0}^\infty c_i f_{j - i}$ represents the **total expected number of cases with known outcomes by time `t`**. It sums all incident cases $c_i$, each weighted by the probability density function $f_{j−i}$ that their outcomes become known after a delay of $j−i$ days. @@ -433,7 +433,7 @@ ebola_30days$cases[1] * density(onset_to_death_ebola, at = 0) ``` -Notice that the most recently observe cases start the delay distribution from `0` the others continue with the following day. +Notice that the most recently observed cases start the delay distribution from `0`, while the others continue with the following day. Since the input value in `at` varies by day for each case (`at = 0`, `at = 1`, `at = 2`, ...), the `density()` needs to be expressed as a function of x. `{cfr}` will then draw values accordingly, as shown below: @@ -463,7 +463,7 @@ When using an `` class object we can use this expression as a temp `function(x) density(, x)` -For distribution functions with parameters not available in `{epiparameter}`, we suggest you two alternatives: +For distribution functions with parameters not available in `{epiparameter}`, we suggest two alternatives: - Create an `` class object, to plug into other R packages of the outbreak analytics pipeline. Read the [reference documentation of `epiparameter::epiparameter()`](https://epiverse-trace.github.io/epiparameter/reference/epiparameter.html), or @@ -475,9 +475,9 @@ For distribution functions with parameters not available in `{epiparameter}`, we ### When to use discrete distributions? -For `cfr_static()` and all functions in the `cfr_*()` family, the most appropriate choice to use are **discrete** distributions. This is because `{cfr}` operates on count data in discrete time: daily case and death counts. +For `cfr_static()` and all functions in the `cfr_*()` family, the most appropriate distributions to use are **discrete** ones. This is because `{cfr}` operates on count data in discrete time: daily case and death counts. -We can assume that evaluating the Probability Distribution Function (PDF) of a *continuous* distribution is equivalent to the Probability Mass Function (PMF) of the equivalent *discrete* distribution. +We can assume that evaluating the Probability Density Function (PDF) of a *continuous* distribution is equivalent to the Probability Mass Function (PMF) of the equivalent *discrete* distribution. However, this assumption may not be appropriate for distributions with larger peaks. For instance, diseases with an onset-to-death distribution that is strongly peaked with a low variance. In such cases, the average disparity between the PDF and PMF is expected to be more pronounced compared to distributions with broader spreads. One way to deal with this is to discretise the continuous distribution using `epiparameter::discretise()` to an `` object. @@ -518,7 +518,7 @@ sarscov2_delay <- ) ``` -We read the data input using `readr::read_csv()`. This function recognize that the column `date` is a `` class vector. +We read the data input using `readr::read_csv()`. This function recognises that the column `date` is a `` class vector. ```{r, eval=TRUE,echo=FALSE,warning=FALSE,message=FALSE} # read data from the tutorial repository R project @@ -541,7 +541,7 @@ sarscov2_input We can use `dplyr::rename()` to adapt the external data to fit the data input for `cfr_static()`. ```{r} -# Rename before Estimate naive CFR +# Rename columns before estimating the delay-adjusted CFR sarscov2_input %>% dplyr::rename( cases = cases_jpn, @@ -590,7 +590,7 @@ rolling_cfr_adjusted <- cfr::cfr_rolling( ) ``` -With `utils::tail()`, we show that the latest CFR estimates. The naive and delay-adjusted estimates have overlapping ranges of 95% confidence intervals. +With `utils::tail()`, we show the latest CFR estimates. The naive and delay-adjusted estimates have overlapping ranges of 95% confidence intervals. ```{r,eval=FALSE,echo=TRUE} # Print the tail of the data frame @@ -625,7 +625,7 @@ dplyr::bind_rows( ) ``` -The horizontal line represents the delay-adjusted CFR estimated at the outbreak's end. The dotted line means the estimate has a 95% confidence interval (95% CI). +Each line shows the rolling CFR estimate over time — naive and delay-adjusted — with its 95% confidence interval (95% CI) shown as a shaded band of the same colour. **Notice** that this delay-adjusted calculation is particularly useful when an _epidemic curve of confirmed cases_ is the only data available (i.e. when individual data from onset to death are unavailable, especially during the early stage of the epidemic). When there are few deaths or none at all, an assumption has to be made for the *delay distribution* from onset to death, e.g. from literature based on previous outbreaks. [Nishiura et al., 2009](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0006852) depict this in the figures with data from the SARS outbreak in Hong Kong, 2003. @@ -736,7 +736,7 @@ Using `{cfr}`, we can change the inputs for the numerator (`cases`) and denomina :::::::::::::::::::::::::::: solution -### Infection and Hospitalization fatality risk +### Infection and Hospitalisation fatality risk If for a _Case_ fatality risk (CFR), we require: @@ -769,7 +769,7 @@ Similarly, the _Hospitalisation_ Fatality Risk (HFR) requires: - mCHR medically attended case-hospitalisation risk, - HFR hospitalisation-fatality risk. -![Schematic diagram of the baseline analyses. Red, blue, and green arrows denote the data flow from laboratory-confirmed cases of passive surveillance, clinically-diagnosed cases, and laboratory-confirmed cases of active screenings.](fig/cfr-s41467-020-19238-2-fig_b.png){alt='Data source of COVID-19 cases in Wuhan: D1) 32,583 laboratory-confirmed COVID-19 cases as of March 84, D2) 17,365 clinically-diagnosed COVID-19 cases during February 9–194, D3)daily number of laboratory-confirmed cases on March 9–April 243, D4) total number of COVID-19 deaths as of April 24 obtained from the Hubei Health Commission3, D5) 325 laboratory-confirmed cases and D6) 1290 deaths were added as of April 16 through a comprehensive and systematic verification by Wuhan Authorities3, and D7) 16,781 laboratory-confirmed cases identified through universal screening10,11. Pse: RT-PCR sensitivity12. Pmed.care: proportion of seeking medical assistance among patients suffering from acute respiratory infections13.'} +![Schematic diagram of the baseline analyses. Red, blue, and green arrows denote the data flow from laboratory-confirmed cases of passive surveillance, clinically-diagnosed cases, and laboratory-confirmed cases of active screenings.](fig/cfr-s41467-020-19238-2-fig_b.png){alt='Data source of COVID-19 cases in Wuhan: D1) 32,583 laboratory-confirmed COVID-19 cases as of March 8, D2) 17,365 clinically-diagnosed COVID-19 cases during February 9–19, D3)daily number of laboratory-confirmed cases on March 9–April 24, D4) total number of COVID-19 deaths as of April 24 obtained from the Hubei Health Commission, D5) 325 laboratory-confirmed cases and D6) 1290 deaths were added as of April 16 through a comprehensive and systematic verification by Wuhan Authorities, and D7) 16,781 laboratory-confirmed cases identified through universal screening. Pse: RT-PCR sensitivity. Pmed.care: proportion of seeking medical assistance among patients suffering from acute respiratory infections.'} :::::::::::::::::::::::::::: @@ -791,7 +791,7 @@ outbreaks::ebola_sierraleone_2014 %>% as_tibble() From the `{outbreaks}` package, load the MERS linelist of cases from the `mers_korea_2015` object. -Rearrange your this linelist to fit into the `{cfr}` package input. +Rearrange this linelist to fit into the `{cfr}` package input. Estimate the delay-adjusted CFR using the corresponding distribution delay. @@ -834,7 +834,7 @@ mers_korea_2015$linelist %>% # Use {incidence2} to count daily incidence mers_incidence <- mers_korea_2015$linelist %>% - # converto to incidence2 object + # convert to incidence2 object incidence(date_index = c("dt_onset", "dt_death")) %>% # complete dates from first to last incidence2::complete_dates() diff --git a/episodes/superspreading-estimate.Rmd b/episodes/superspreading-estimate.Rmd index 9c8f1e35..c1e50d51 100644 --- a/episodes/superspreading-estimate.Rmd +++ b/episodes/superspreading-estimate.Rmd @@ -59,13 +59,13 @@ go to the [main setup page](../learners/setup.md#software-setup). -From smallpox to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), some infected individuals spread infection to more people than others. Disease transmission is the result of a combination of biological and social factors, and these factors average out to some extent at the population level during a large epidemic. Hence researchers often use population averages to assess the potential for disease to spread. However, in the earlier or later phases of an outbreak, individual differences in infectiousness can be more important. In particular, they increase the chance of superspreading events (SSEs), which can ignite explosive epidemics and also influence the chances of controlling transmission ([Lloyd-Smith et al., 2005](https://wellcomeopenresearch.org/articles/5-83)). +From smallpox to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), some infected individuals spread infection to more people than others. Disease transmission is the result of a combination of biological and social factors, and these factors average out to some extent at the population level during a large epidemic. Hence researchers often use population averages to assess the potential for disease to spread. However, in the earlier or later phases of an outbreak, individual differences in infectiousness can be more important. In particular, they increase the chance of superspreading events (SSEs), which can ignite explosive epidemics and also influence the chances of controlling transmission ([Lloyd-Smith et al., 2005](https://www.nature.com/articles/nature04153)). ![**Chains of SARS-CoV-2 transmission in Hong Kong initiated by local or imported cases.** (**a**), Transmission network of a cluster of cases traced back to a collection of four bars across Hong Kong (n = 106). (**b**), Transmission network associated with a wedding without clear infector–infectee pairs but linked back to a preceding social gathering and local source (n = 22). (**c**), Transmission network associated with a temple cluster of undetermined source (n = 19). (**d**), All other clusters of SARS-CoV-2 infections where the source and transmission chain could be determined ([Adam et al., 2020](https://www.nature.com/articles/s41591-020-1092-0)).](fig/see-intro-superspreading.png) -The [basic reproduction number](../learners/reference.md#basic), $R_{0}$, measures the average number of cases caused by one infectious individual in a entirely susceptible population. Estimates of $R_{0}$ are useful for understanding the average dynamics of an epidemic at the population-level, but can obscure considerable individual variation in infectiousness. This was highlighted during the global emergence of SARS-CoV-2 by numerous ‘superspreading events’ in which certain infectious individuals generated unusually large numbers of secondary cases ([LeClerc et al, 2020](https://wellcomeopenresearch.org/articles/5-83)). +The [basic reproduction number](../learners/reference.md#basic), $R_{0}$, measures the average number of cases caused by one infectious individual in a entirely susceptible population. Estimates of $R_{0}$ are useful for understanding the average dynamics of an epidemic at the population-level, but can obscure considerable individual variation in infectiousness. This was highlighted during the global emergence of SARS-CoV-2 by numerous ‘superspreading events’ in which certain infectious individuals generated unusually large numbers of secondary cases ([Leclerc et al., 2020](https://wellcomeopenresearch.org/articles/5-83)). ![**Observed offspring distribution of SARS-CoV-2 transmission in Hong Kong.** N = 91 SARS-CoV-2 infectors, N = 153 terminal infectees and N = 46 sporadic local cases. Histogram bars indicate the proportion of onward transmission per amount of secondary cases. Line corresponds to a fitted negative binomial distribution ([Adam et al., 2020](https://www.nature.com/articles/s41591-020-1092-0)).](fig/see-intro-secondary-cases-fig-b.png){alt='R = 0.58 and k = 0.43.'} @@ -96,7 +96,7 @@ library(tidyverse) ### The double-colon -The double-colon `::` in R let you call a specific function from a package without loading the entire package into the current environment. +The double-colon `::` in R lets you call a specific function from a package without loading the entire package into the current environment. The package must be installed. For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package. @@ -104,7 +104,7 @@ This help us remember package functions and avoid namespace conflicts. ::::::::::::::::::: -## The individual reprodution number +## The individual reproduction number The individual reproduction number is defined as the number of secondary cases caused by a particular infected individual. @@ -338,9 +338,9 @@ For occurrences of associated discrete events we can use **Poisson** or negative In a Poisson distribution, mean is equal to variance. But when variance is higher than the mean, this is called **overdispersion**. In biological applications, overdispersion occurs and so a negative binomial may be worth considering as an alternative to Poisson distribution. -**Negative binomial** distribution is specially useful for discrete data over an unbounded positive range whose sample variance exceeds the sample mean. In such terms, the observations are overdispersed with respect to a Poisson distribution, for which the mean is equal to the variance. +The **negative binomial** distribution is specially useful for discrete data over an unbounded positive range whose sample variance exceeds the sample mean. In such terms, the observations are overdispersed with respect to a Poisson distribution, for which the mean is equal to the variance. -In epidemiology, [negative binomial](https://en.wikipedia.org/wiki/Negative_binomial_distribution) have being used to model disease transmission for infectious diseases where the likely number of onward infections may vary considerably from individual to individual and from setting to setting, capturing all variation in infectious histories of individuals, including properties of the biological (i.e. degree of viral shedding) and environmental circumstances (e.g. type and location of contact). +In epidemiology, the [negative binomial distribution](https://en.wikipedia.org/wiki/Negative_binomial_distribution) has been used to model disease transmission for infectious diseases where the likely number of onward infections may vary considerably from individual to individual and from setting to setting, capturing all variation in infectious histories of individuals, including properties of the biological (i.e. degree of viral shedding) and environmental circumstances (e.g. type and location of contact). ::::::::::::::::::::::::::::: @@ -488,7 +488,7 @@ ggplot() + ### Individual-level variation in transmission -The individual-level variation in transmission is defined by the relationship between the mean ($R_{0}$), dispersion ($k$), and the variance of a negative binomial distribution. +The individual-level variation in transmission is defined by the relationship between the mean ($R_{0}$) and dispersion ($k$), which together define the variance of a negative binomial distribution. The negative binomial model has $variance = R_{0}(1+\frac{R_{0}}{k})$, so smaller values of $k$ indicate greater variance and, consequently, greater **individual-level variation** in transmission. @@ -560,7 +560,7 @@ We can use the maximum likelihood estimates from `{fitdistrplus}` to compare dif ### The dispersion parameter across diseases -Research into sexually transmitted and vector-borne diseases has previously suggested a '20/80' rule, with 20% of individuals contributing at least 80% of the transmission potential ([Woolhouse et al](https://www.pnas.org/doi/10.1073/pnas.94.1.338)). +Research into sexually transmitted and vector-borne diseases has previously suggested a '20/80' rule, with 20% of individuals contributing at least 80% of the transmission potential ([Woolhouse et al., 1997](https://www.pnas.org/doi/10.1073/pnas.94.1.338)). On its own, the dispersion parameter $k$ is hard to interpret intuitively, and hence converting into proportional summary can enable easier comparison. When we consider a wider range of pathogens, we can see there is no hard and fast rule for the percentage that generates 80% of transmission, but variation does emerge as a common feature of infectious diseases @@ -688,11 +688,11 @@ During an outbreak, it is common to try and reduce transmission by identifying p In the presence of individual-level variation in transmission, i.e., with an overdispersed offspring distribution, if this primary case is identified, a larger fraction of the transmission chain can be detected by forward tracing each of the contacts of this primary case ([Endo et al., 2020](https://wellcomeopenresearch.org/articles/5-239/v3)). -![Schematic representation of contact tracing strategies. Black arrows indicate the directions of transmission, blue and Orange arrows, a successful or failed contact tracing, respectivelly. When there is evidence of individual-level variation in transmission, often resulting in superspreading, backward contact tracing from the index case (blue circle) increase the probability to find the primary case (green circle) or clusters with a larger fraction of cases, potentially increasing the number of quarentined cases (yellow circles). [Claire Blackmore, 2021](https://www.paho.org/sites/default/files/backward_contact_tracing_v3_0.pdf)](fig/contact-tracing-strategies.png) +![Schematic representation of contact tracing strategies. Black arrows indicate the directions of transmission, blue and Orange arrows, a successful or failed contact tracing, respectively. When there is evidence of individual-level variation in transmission, often resulting in superspreading, backward contact tracing from the index case (blue circle) increase the probability to find the primary case (green circle) or clusters with a larger fraction of cases, potentially increasing the number of quarentined cases (yellow circles). [Claire Blackmore, 2021](https://www.paho.org/sites/default/files/backward_contact_tracing_v3_0.pdf)](fig/contact-tracing-strategies.png) When there is evidence of individual-level variation (i.e. overdispersion), often resulting in so-called superspreading events, a large proportion of infections may be linked to a small proportion of original clusters. As a result, finding and targeting originating clusters in combination with reducing onwards infection may substantially enhance the effectiveness of tracing methods ([Endo et al., 2020](https://wellcomeopenresearch.org/articles/5-239/v3)). -Empirical evidence focused on evaluating the efficiency of backward tracing lead to 42% more cases identified than forward tracing supporting its implementation when rigorous suppression of transmission is justified ([Raymenants et al., 2022](https://www.nature.com/articles/s41467-022-32531-6)) +Empirical evidence focused on evaluating the efficiency of backward tracing led to 42% more cases identified than forward tracing supporting its implementation when rigorous suppression of transmission is justified ([Raymenants et al., 2022](https://www.nature.com/articles/s41467-022-32531-6)). ## Probability of cases in a given cluster @@ -735,7 +735,7 @@ cluster_probability_percent <- cluster_probability %>% Even though we have an $R<1$, a highly overdispersed offspring distribution ($k=0.02$) means that if we detect a new case, there is a `r cluster_probability_percent` probability they originated from a cluster of 25 infections or more. Hence, by following a backwards strategy, contact tracing efforts will increase the probability of successfully contain and quarantining this large number of earlier infected individuals, rather than simply focusing on the new case, who is likely to have infected nobody (because $k$ is very small). -We can also use this number to prevent gathering of certain sized to reduce the epidemic by preventing potential superspreading events. Interventions can target to reduce the reproduction number in order to reduce the probability of having clusters of secondary cases. +We can also use this number to prevent gatherings of a certain size to reduce the epidemic by preventing potential superspreading events. Interventions can target to reduce the reproduction number in order to reduce the probability of having clusters of secondary cases. ::::::::::::::::::::::::::::::::: challenge diff --git a/episodes/superspreading-simulate.Rmd b/episodes/superspreading-simulate.Rmd index 48b2693d..f74aab3b 100644 --- a/episodes/superspreading-simulate.Rmd +++ b/episodes/superspreading-simulate.Rmd @@ -162,7 +162,7 @@ library(tidyverse) ### The double-colon -The double-colon `::` in R let you call a specific function from a package without loading the entire package into the current environment. +The double-colon `::` in R lets you call a specific function from a package without loading the entire package into the current environment. The package must be installed. For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package. @@ -337,7 +337,7 @@ epichains::simulate_chains( - **simulation controls** (`n_chains` and `statistic`), - **offspring distribution** (`offspring_dist` and required distribution parameters), and -- generation time (`generation_time`). +- **generation time** (`generation_time`). In the lines above, we described how to specify the offspring distribution and generation time. The **simulation controls** include at least two arguments: @@ -362,7 +362,7 @@ We can use `simulate_chains()` to create multiple chains and increase the probab We need to one additional element: -- `set.seed()`, which is a random number generator function with a specified seed value, the `` number, to ensure consistent results across different runs of the code. +- `set.seed()` is a function used to initialise a pseudo-random number generator. By specifying a seed value (the ``), you ensure that the sequence of numbers produced by subsequent random functions, like `rnorm()` or `simulate_chains()`, is identical every time the code is executed. With this configuration, each **chain** will represent **one initial case**. These cases per chain are independent, isolated, and without interactions. This means that each chain will have their own pool of susceptibles, which you can configure by using the `pop` or `percent_immune` arguments. @@ -394,7 +394,7 @@ We can visually count how many chains reach to more than 100 infected cases, wit Use the last run of `epichains::simulate_chains()` for simulating multiple chains. Change the `statistic` from `"size"` to `"length"`. Run the `summary()` function. -- What chain feature this output count for? +- What chain feature does this output show? ::::::::: hint @@ -657,7 +657,7 @@ simulated_chains_map %>% To increase the probability of simulating uncontrolled outbreak projections given an overdispersed offspring distribution, let's simulate **1000 transmission chains** with 1 initial case each starting at day 0. -We will create a multiple simulation **without** iteration for this section: +We will run a simulation with multiple replicates, **without** iteration for this section: ```{r} set.seed(33)