Skip to content

Commit cdd8d18

Browse files
committed
add vignettes
1 parent 41362a5 commit cdd8d18

8 files changed

Lines changed: 3711 additions & 0 deletions

File tree

vignettes/manuscript.Rmd

Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
---
2+
title: "Reproducing BETYdb Manuscript Analyses"
3+
subtitle: "Offline Replication using betydata"
4+
author: "Akash B V"
5+
date: "`r Sys.Date()`"
6+
output: rmarkdown::html_vignette
7+
vignette: >
8+
%\VignetteIndexEntry{Reproducing BETYdb Manuscript Analyses}
9+
%\VignetteEngine{knitr::rmarkdown}
10+
%\VignetteEncoding{UTF-8}
11+
---
12+
```{r setup, include = FALSE}
13+
knitr::opts_chunk$set(
14+
collapse = TRUE,
15+
comment = "#>",
16+
fig.width = 7,
17+
fig.height = 5,
18+
warning = FALSE,
19+
message = FALSE
20+
)
21+
```
22+
23+
## Introduction
24+
25+
This vignette reproduces key analyses from the BETYdb manuscript (LeBauer et al., 2018) using the offline `betydata` package. The original analyses queried a live PostgreSQL database; here we demonstrate equivalent results using packaged data.
26+
27+
**Citation:** LeBauer, D. S., et al. (2018). BETYdb: a yield, trait, and ecosystem service database applied to second-generation bioenergy feedstock production. *GCB Bioenergy*. https://doi.org/10.1111/gcbb.12420
28+
29+
## Setup
30+
```{r load-packages}
31+
library(betydata)
32+
library(dplyr)
33+
library(ggplot2)
34+
35+
# Set theme
36+
theme_set(theme_bw(base_size = 10, base_family = "sans"))
37+
```
38+
39+
## Figure 1: Data Summary by Genus
40+
41+
The manuscript presents trait and yield counts for bioenergy genera.
42+
```{r genus-summary}
43+
data(traitsview)
44+
45+
# Define focal genera (from manuscript)
46+
bioenergy_genera <- c("Miscanthus", "Panicum", "Populus", "Saccharum",
47+
"Pinus", "Salix", "Robinia")
48+
49+
# Compute counts by genus and result type
50+
genus_summary <- traitsview |>
51+
filter(genus %in% bioenergy_genera, checked >= 0) |>
52+
group_by(genus) |>
53+
summarise(
54+
n_traits = sum(result_type == "traits", na.rm = TRUE),
55+
n_yields = sum(result_type == "yields", na.rm = TRUE),
56+
total = n(),
57+
.groups = "drop"
58+
) |>
59+
arrange(desc(total))
60+
61+
genus_summary
62+
```
63+
64+
### Comparison Notes
65+
66+
The counts differ slightly from the published manuscript because:
67+
68+
1. betydata excludes `checked = -1` (failed QA/QC records)
69+
2. Snapshot date: betydata was exported on `r format(Sys.Date(), "%Y-%m-%d")`; the manuscript used 2017 data
70+
3. Access level filtering: betydata includes only public data (`access_level < 4`)
71+
72+
## Figure 2: Trait Records by Genus
73+
```{r trait-counts-plot, fig.height = 6}
74+
# Key traits analyzed in manuscript
75+
focal_traits <- c("Ayield", "leafN", "LAI", "SLA", "Vcmax",
76+
"leaf_respiration_rate_m2", "Jmax")
77+
78+
trait_counts <- traitsview |>
79+
filter(
80+
genus %in% bioenergy_genera,
81+
trait %in% focal_traits,
82+
checked >= 0
83+
) |>
84+
count(genus, trait, name = "n")
85+
86+
ggplot(trait_counts, aes(x = genus, y = n, fill = trait)) +
87+
geom_col(position = "dodge") +
88+
scale_y_log10(breaks = c(1, 10, 100, 1000, 10000)) +
89+
coord_flip() +
90+
labs(
91+
x = NULL,
92+
y = "Number of Records (log scale)",
93+
fill = "Trait"
94+
) +
95+
theme(
96+
legend.position = "right",
97+
panel.grid.minor = element_blank()
98+
)
99+
```
100+
101+
## Figure 3: Trait Distributions
102+
103+
The manuscript displays histograms of trait values across genera.
104+
```{r trait-distributions, fig.height = 8}
105+
# Select key traits for visualization
106+
hist_traits <- c("Ayield", "SLA", "Vcmax", "LAI")
107+
108+
trait_data <- traitsview |>
109+
filter(
110+
trait %in% hist_traits,
111+
!is.na(mean),
112+
checked >= 0,
113+
genus %in% c(bioenergy_genera, "Other")
114+
) |>
115+
mutate(
116+
genus = if_else(genus %in% bioenergy_genera, genus, "Other"),
117+
genus = factor(genus)
118+
)
119+
120+
ggplot(trait_data, aes(x = mean, fill = genus)) +
121+
geom_histogram(bins = 25, alpha = 0.7) +
122+
facet_wrap(~trait, scales = "free", ncol = 2) +
123+
labs(
124+
x = "Observed Value",
125+
y = "Count",
126+
fill = "Genus"
127+
) +
128+
theme(
129+
legend.position = "bottom",
130+
strip.background = element_blank()
131+
)
132+
```
133+
134+
## Table 1: Database Contents Summary
135+
```{r contents-table}
136+
contents <- traitsview |>
137+
filter(checked >= 0) |>
138+
group_by(genus) |>
139+
summarise(
140+
n_traits = sum(result_type == "traits", na.rm = TRUE),
141+
n_yields = sum(result_type == "yields", na.rm = TRUE),
142+
total = n(),
143+
.groups = "drop"
144+
) |>
145+
filter(total >= 100) |> # Genera with substantial data
146+
arrange(desc(total))
147+
148+
# Top 15 genera
149+
knitr::kable(
150+
head(contents, 15),
151+
col.names = c("Genus", "Traits", "Yields", "Total"),
152+
caption = "Data records by genus (top 15)"
153+
)
154+
```
155+
156+
## Yield Meta-Analysis Subset
157+
158+
The manuscript includes a meta-analysis of Miscanthus and Switchgrass yields. Here we extract the relevant subset:
159+
```{r yield-meta-analysis}
160+
# following manuscript criteria:
161+
# - Miscanthus and Panicum only
162+
# - yield trait (Ayield)
163+
# - with site coordinates
164+
165+
yield_ma <- traitsview |>
166+
filter(
167+
genus %in% c("Miscanthus", "Panicum"),
168+
trait == "Ayield",
169+
!is.na(lat),
170+
!is.na(lon),
171+
!is.na(mean),
172+
checked >= 0
173+
) |>
174+
select(
175+
id, genus, scientificname, mean, units,
176+
n, stat, statname, lat, lon,
177+
author, citation_year, sitename, site_id
178+
)
179+
180+
yield_ma |>
181+
group_by(genus) |>
182+
summarise(
183+
n_records = n(),
184+
mean_yield = mean(mean),
185+
sd_yield = sd(mean),
186+
n_sites = n_distinct(site_id),
187+
.groups = "drop"
188+
)
189+
```
190+
191+
## Geographic Distribution
192+
```{r geographic-map, fig.width = 8, fig.height = 5}
193+
# Simple map of yield observation locations
194+
ggplot(yield_ma, aes(x = lon, y = lat, color = genus)) +
195+
geom_point(alpha = 0.6, size = 2) +
196+
borders("world", colour = "grey70", fill = NA) +
197+
coord_quickmap(xlim = c(-130, 50), ylim = c(20, 70)) +
198+
labs(
199+
x = "Longitude",
200+
y = "Latitude",
201+
color = "Genus",
202+
title = "Miscanthus and Switchgrass Yield Observations"
203+
) +
204+
theme_minimal()
205+
```
206+
207+
## Session Info
208+
```{r session-info}
209+
sessionInfo()
210+
```

0 commit comments

Comments
 (0)