Skip to content

Commit 164df97

Browse files
committed
add readme
1 parent 0a1b59b commit 164df97

1 file changed

Lines changed: 187 additions & 0 deletions

File tree

README.md

Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
2+
[![License: BSD-3-Clause](https://img.shields.io/badge/code%20license-BSD--3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
3+
[![License: ODC-By-1.0](https://img.shields.io/badge/data%20license-ODC--By--1.0-green.svg)](https://opendatacommons.org/licenses/by/1-0/)
4+
[![DOI](https://img.shields.io/badge/Paper-10.1111%2Fgcbb.12420-blue.svg)](https://doi.org/10.1111/gcbb.12420)
5+
6+
**betydata** provides offline access to public data from the [BETYdb: Biofuel Ecophysiological Traits and Yields Database](https://betydb.org). This R data package enables reproducible analyses of plant traits, crop yields, and ecosystem service data without requiring database connectivity.
7+
8+
---
9+
10+
## Overview
11+
12+
| | |
13+
|---------------------------|------------------------------------------------------------------------|
14+
| **Primary Dataset** | `traitsview` - 43,532 trait and yield observations |
15+
| **Support Tables** | 15 reference tables (species, sites, variables, citations, pfts, etc.) |
16+
| **Species Coverage** | ~9,000 plant species with emphasis on bioenergy crops |
17+
| **Geographic Scope** | Global, with concentration in North America and Europe |
18+
| **Temporal Range** | 1900 – present |
19+
| **Top Genera** | *Miscanthus*, *Panicum*, *Populus*, *Salix*, *Saccharum* |
20+
| **Data License** | [ODC-By-1.0](https://opendatacommons.org/licenses/by/1-0/) |
21+
| **Frictionless Metadata** | [`inst/metadata/datapackage.json`](inst/metadata/datapackage.json) |
22+
23+
---
24+
25+
## Datasets
26+
27+
This package provides 16 datasets exported from BETYdb:
28+
29+
### Primary Dataset
30+
31+
| Dataset | Rows | Columns | Description |
32+
|---------------|--------|---------|----------------------------------------------|
33+
| `traitsview` | 43,532 | 36 | Denormalized view of plant traits and yields |
34+
| Dataset | Description |
35+
|---------------|---------------------------------------------------------------|
36+
| `species` | Plant taxonomy (genus, species, common names) |
37+
| `sites` | Research site locations with coordinates and climate data |
38+
| `variables` | Trait/variable definitions, units, and valid ranges |
39+
| `citations` | Literature references (author, year, title, DOI) |
40+
| `cultivars` | Plant cultivar and variety information |
41+
| `treatments` | Experimental treatment definitions |
42+
| `managements` | Management events (planting, harvest, fertilization) |
43+
| `methods` | Measurement method descriptions |
44+
| `pfts` | Plant Functional Type definitions for ecological modeling |
45+
| `priors` | Prior probability distributions for Bayesian analysis |
46+
| `entities` | Entity identifiers for repeated measures |
47+
48+
### Relationship Tables
49+
50+
| Dataset | Description |
51+
|----------------------------|--------------------------------|
52+
| `pfts_species` | PFT <-> species mapping |
53+
| `pfts_priors` | PFT <-> prior mapping |
54+
| `cultivars_pfts` | Cultivar <-> PFT mapping |
55+
| `managements_treatments` | Management <-> treatment mapping |
56+
57+
---
58+
59+
## Installation
60+
61+
### From GitHub (recommended)
62+
```r
63+
# install.packages("remotes")
64+
remotes::install_github("PecanProject/betydata")
65+
```
66+
67+
### From source
68+
```bash
69+
git clone https://github.com/PecanProject/betydata.git
70+
R CMD INSTALL betydata
71+
```
72+
73+
## Quick Start
74+
```r
75+
library(betydata)
76+
77+
# Load the primary dataset
78+
data(traitsview)
79+
80+
# Explore structure
81+
str(traitsview)
82+
head(traitsview)
83+
84+
# Count observations by trait
85+
library(dplyr)
86+
traitsview |> count(trait, sort = TRUE)
87+
88+
# Count by genus (top bioenergy crops)
89+
traitsview |> count(genus, sort = TRUE) |> head(10)
90+
```
91+
92+
---
93+
94+
## Data Quality
95+
96+
### The `checked` Column
97+
98+
All trait and yield data include a quality control flag:
99+
100+
| Value | Meaning | Status |
101+
|-------|-----------|-----------------------------------------------------------|
102+
| `1` | Verified | Independently reviewed and confirmed |
103+
| `0` | Unchecked | Not yet reviewed |
104+
| `-1` | Flagged | Identified as incorrect (excluded from this package) |
105+
106+
**Note:** This package exports only `checked >= 0` data. Flagged records (`checked = -1`) are excluded during data preparation. For research requiring unchecked data, access the BETYdb PostgreSQL database directly.
107+
108+
### Access Levels
109+
110+
All data in this package is publicly available (`access_level = 4`). Restricted data (`access_level` 1–3) requires database access with appropriate permissions.
111+
112+
---
113+
114+
## Key Traits and Yields
115+
116+
The `traitsview` dataset contains measurements of ecophysiological traits and crop yields:
117+
118+
### Common Traits
119+
120+
* **SLA** - Specific Leaf Area (m2/kg)
121+
* **Vcmax** - Maximum carboxylation rate (umol/m2/s)
122+
* **leafN** - Leaf nitrogen content (%)
123+
* **height** - Plant height (m)
124+
* **LAI** - Leaf Area Index (m2/m2)
125+
126+
### Yield Variables
127+
128+
* **Ayield** - Above-ground yield (Mg/ha)
129+
* **AGBiomass** - Above-ground biomass (Mg/ha)
130+
131+
Use the `variables` table for complete definitions and units:
132+
```r
133+
data(variables)
134+
variables |>
135+
filter(name %in% c("SLA", "Vcmax", "Ayield")) |>
136+
select(name, description, units)
137+
```
138+
139+
---
140+
141+
## Data Formats
142+
143+
### .rda (Default)
144+
145+
Lazy-loaded R data objects, optimized for R workflows:
146+
```r
147+
data(traitsview)
148+
```
149+
150+
### Parquet (Alternative)
151+
152+
For use with Arrow/DuckDB or cross-platform workflows:
153+
```r
154+
library(arrow)
155+
traitsview <- read_parquet(
156+
system.file("extdata/parquet/traitsview.parquet", package = "betydata")
157+
)
158+
```
159+
160+
### Frictionless Data Package
161+
162+
Machine-readable metadata following the Frictionless data standard:
163+
```json
164+
// inst/metadata/datapackage.json
165+
{
166+
"name": "betydata",
167+
"title": "BETYdb Plant Traits and Yields Data Package",
168+
"licenses": [{"name": "ODC-By-1.0", ...}],
169+
"resources": [...]
170+
}
171+
```
172+
173+
---
174+
175+
## Vignettes
176+
177+
Detailed tutorials are available as package vignettes:
178+
179+
| Vignette | Description |
180+
|----------------|----------------------------------------------------------|
181+
| `orientation` | Overview of package structure and data relationships |
182+
| `sql-analogs` | Migrate BETYdb SQL queries to R with dplyr |
183+
| `pfts-priors` | Working with PFTs and prior distributions |
184+
| `manuscript` | Reproduce analyses from LeBauer et al. (2018) |
185+
```r
186+
browseVignettes("betydata")
187+
```

0 commit comments

Comments
 (0)