|
| 1 | +--- |
| 2 | +title: "Getting Started with betydata" |
| 3 | +vignette: > |
| 4 | + %\VignetteIndexEntry{Getting Started with betydata} |
| 5 | + %\VignetteEngine{quarto::html} |
| 6 | + %\VignetteEncoding{UTF-8} |
| 7 | +--- |
| 8 | + |
| 9 | +::: {.callout-note} |
| 10 | +## What you will learn |
| 11 | + |
| 12 | +- What data is available in betydata and how the 16 tables relate to each other |
| 13 | +- How to explore trait and yield observations using dplyr |
| 14 | +- Key concepts: traits, yields, QA/QC flags, and Plant Functional Types |
| 15 | +::: |
| 16 | + |
| 17 | +## What is betydata? |
| 18 | + |
| 19 | +The `betydata` package provides offline access to public data from [BETYdb](https://betydb.org), the Biofuel Ecophysiological Traits and Yields database. BETYdb is a centralized repository of plant trait measurements and crop yield data used in ecosystem modeling and agricultural research. |
| 20 | + |
| 21 | +A **trait** is a measurable characteristic of a plant -- for example, Specific Leaf Area (SLA, m2/kg), maximum carboxylation rate (Vcmax, umol/m2/s), or leaf nitrogen content (%). A **yield** is a measure of crop production per unit area (typically Mg/ha). Together, traits and yields form the foundation of ecosystem model parameterization and agricultural research. |
| 22 | + |
| 23 | +## Loading the Package |
| 24 | + |
| 25 | +```{r} |
| 26 | +library(betydata) |
| 27 | +library(dplyr) |
| 28 | +``` |
| 29 | + |
| 30 | +## Data Architecture {#sec-architecture} |
| 31 | + |
| 32 | +The package contains **16 tables** organized in three tiers: |
| 33 | + |
| 34 | +```{r} |
| 35 | +#| label: tbl-tables |
| 36 | +#| tbl-cap: "All tables available in betydata" |
| 37 | +data(package = "betydata")$results[, c("Item", "Title")] |> |
| 38 | + as.data.frame() |> |
| 39 | + knitr::kable() |
| 40 | +``` |
| 41 | + |
| 42 | +::: {.callout-tip} |
| 43 | +## Data Model |
| 44 | + |
| 45 | +The tables follow a relational structure: |
| 46 | + |
| 47 | +- **`traitsview`** is the primary denormalized table (pre-joined for convenience) |
| 48 | +- **Metadata tables** (`species`, `sites`, `variables`, `citations`, etc.) provide reference data |
| 49 | +- **Relationship tables** (`pfts_species`, `pfts_priors`, etc.) are many-to-many junction tables |
| 50 | + |
| 51 | +You can use `traitsview` for most analyses without joining anything. The metadata and relationship tables are available when you need additional detail or custom aggregations. |
| 52 | +::: |
| 53 | + |
| 54 | +## The Primary Table: traitsview {#sec-traitsview} |
| 55 | + |
| 56 | +The `traitsview` table is a denormalized view combining traits and yields with associated metadata. Key analytical columns are placed first for convenient interactive use: |
| 57 | + |
| 58 | +```{r} |
| 59 | +traitsview |
| 60 | +``` |
| 61 | + |
| 62 | +### Key Columns {#sec-columns} |
| 63 | + |
| 64 | +| Column | Description | Example Values | |
| 65 | +|------------------|--------------------------------------------------|---------------------------| |
| 66 | +| `trait` | Variable name | SLA, Vcmax, Ayield | |
| 67 | +| `mean` | Observed value | 22.5, 38.1 | |
| 68 | +| `units` | Measurement units | m2/kg, umol/m2/s | |
| 69 | +| `scientificname` | Full species name | *Miscanthus x giganteus* | |
| 70 | +| `genus` | Genus | Miscanthus, Panicum | |
| 71 | +| `sitename` | Research site | Energy Farm, Urbana IL | |
| 72 | +| `author` | Citation author | Heaton 2008 | |
| 73 | +| `checked` | QA/QC status (0 = unchecked, 1 = verified) | 0, 1 | |
| 74 | + |
| 75 | +## Basic Exploration |
| 76 | + |
| 77 | +```{r} |
| 78 | +#| label: tbl-trait-counts |
| 79 | +#| tbl-cap: "Top 15 most common traits in betydata" |
| 80 | +traitsview |> |
| 81 | + count(trait, sort = TRUE) |> |
| 82 | + head(15) |> |
| 83 | + knitr::kable() |
| 84 | +``` |
| 85 | + |
| 86 | +## Data Quality: The `checked` Column {#sec-checked} |
| 87 | + |
| 88 | +::: {.callout-important} |
| 89 | +## Quality Control |
| 90 | + |
| 91 | +The `checked` column indicates data verification status: |
| 92 | + |
| 93 | +- **`1`** = Verified by an independent reviewer |
| 94 | +- **`0`** = Not yet reviewed (use with appropriate caution) |
| 95 | +- **`-1`** = Flagged as incorrect (**excluded** from this package) |
| 96 | + |
| 97 | +All data in this package is public (BETYdb `access_level = 4`). |
| 98 | +::: |
| 99 | + |
| 100 | +```{r} |
| 101 | +table(traitsview$checked, useNA = "ifany") |
| 102 | +
|
| 103 | +verified <- traitsview |> |
| 104 | + filter(checked == 1) |
| 105 | +nrow(verified) |
| 106 | +``` |
| 107 | + |
| 108 | +## Support Tables {#sec-support} |
| 109 | + |
| 110 | +### Species Taxonomy |
| 111 | + |
| 112 | +The `species` table contains `r format(nrow(species), big.mark = ",")` entries with full taxonomic information: |
| 113 | + |
| 114 | +```{r} |
| 115 | +species |> |
| 116 | + select(id, scientificname, genus, commonname) |
| 117 | +``` |
| 118 | + |
| 119 | +### Variables (Trait Definitions) |
| 120 | + |
| 121 | +The `variables` table documents units, descriptions, and valid ranges for each measured trait: |
| 122 | + |
| 123 | +```{r} |
| 124 | +variables |> |
| 125 | + filter(name %in% c("SLA", "Vcmax", "leaf_respiration_rate_m2", "Ayield")) |> |
| 126 | + select(name, units, description) |
| 127 | +``` |
| 128 | + |
| 129 | +### Sites |
| 130 | + |
| 131 | +```{r} |
| 132 | +sites_with_climate <- sites |> |
| 133 | + filter(!is.na(mat), !is.na(map)) |
| 134 | +nrow(sites_with_climate) |
| 135 | +``` |
| 136 | + |
| 137 | +## Example: Bioenergy Crop Yields {#sec-bioenergy} |
| 138 | + |
| 139 | +```{r} |
| 140 | +#| label: tbl-bioenergy |
| 141 | +#| tbl-cap: "Yield summary for key bioenergy genera" |
| 142 | +bioenergy_genera <- c("Miscanthus", "Panicum", "Populus", "Salix", "Saccharum") |
| 143 | +
|
| 144 | +yields <- traitsview |> |
| 145 | + filter( |
| 146 | + trait == "Ayield", |
| 147 | + genus %in% bioenergy_genera, |
| 148 | + !is.na(mean) |
| 149 | + ) |> |
| 150 | + select(genus, mean, units, sitename, author, citation_year, lat, lon) |
| 151 | +
|
| 152 | +yields |> |
| 153 | + summarise( |
| 154 | + n = n(), |
| 155 | + mean_yield = round(mean(mean, na.rm = TRUE), 1), |
| 156 | + sd_yield = round(sd(mean, na.rm = TRUE), 1), |
| 157 | + .by = genus |
| 158 | + ) |> |
| 159 | + knitr::kable(col.names = c("Genus", "N", "Mean Yield (Mg/ha)", "SD")) |
| 160 | +``` |
| 161 | + |
| 162 | +## Working with Plant Functional Types (PFTs) {#sec-pfts} |
| 163 | + |
| 164 | +::: {.callout-note} |
| 165 | +## What is a PFT? |
| 166 | + |
| 167 | +A **Plant Functional Type** groups species with similar ecological characteristics for ecosystem modeling. Instead of parameterizing models for each species individually, PFTs like "temperate deciduous trees" or "C4 grasses" define shared parameter distributions. This approach is essential when species-level data is sparse and makes modeling tractable at large scales. |
| 168 | +::: |
| 169 | + |
| 170 | +```{r} |
| 171 | +miscanthus_sp <- species |> |
| 172 | + filter(genus == "Miscanthus") |> |
| 173 | + pull(id) |
| 174 | +
|
| 175 | +pfts_species |> |
| 176 | + filter(specie_id %in% miscanthus_sp) |> |
| 177 | + left_join(pfts |> select(id, name), by = c("pft_id" = "id")) |> |
| 178 | + distinct(name) |
| 179 | +``` |
| 180 | + |
| 181 | +## Next Steps |
| 182 | + |
| 183 | +| Vignette | Description | |
| 184 | +|--------------------------------|-----------------------------------------------| |
| 185 | +| `vignette("common_analyses")` | Common analysis patterns with dplyr | |
| 186 | +| `vignette("pfts-priors")` | Working with PFTs and Bayesian priors | |
| 187 | +| `vignette("manuscript")` | Reproduce analyses from LeBauer et al. (2018) | |
| 188 | + |
| 189 | +## References |
| 190 | + |
| 191 | +- LeBauer, D. S., et al. (2018). BETYdb: a yield, trait, and ecosystem service database applied to second-generation bioenergy feedstock production. *GCB Bioenergy*. [doi:10.1111/gcbb.12420](https://doi.org/10.1111/gcbb.12420) |
| 192 | +- LeBauer, D. S., et al. (2013). Facilitating feedbacks between field measurements and ecosystem models. *Ecological Monographs*, 83(2), 133--154. |
| 193 | +- BETYdb documentation: <https://betydb.org> |
0 commit comments