Skip to content

Commit 76324af

Browse files
committed
revising README and moving content around within traits tutorials
1 parent 8d9b3be commit 76324af

5 files changed

Lines changed: 163 additions & 74 deletions

File tree

README.md

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,34 @@
11
# Tutorials
22

3-
Learn to use TERRA REF data and software
3+
## An introduction to the use of TERRA REF data and software
44

5-
Note that many of these tutorials have complex dependencies. These can be launched from within the [TERRA REF Sensor Data Portal](https://terraref.ncsa.illinois.edu) (requires account / access).
5+
Many of these tutorials have complex dependencies.
66

7-
Also try:
7+
### Data Access
88

9-
* mybinder.org/repo/terraref/rstudio (for R tutorials under traits/)
10-
* mybinder.org/repo/terraref/rstudio-geospatial (for R tutorials under sensors/)
11-
* mybinder.org/repo/terraref/jupyter-plantcv (for PlantCV tutorial)
12-
* mybinder.org/repo/terraref/jupyter-netcdf (Python tutorials under sensors/)
9+
The first research-grade version of TERRA REF data products will be released in November 2018.
10+
Before that, we will make evaluation releases available: the alpha version was released in November 2016 and the beta version will be released in 2017.
1311

12+
make the data available for evaluation with the goal of receiving feedback.
13+
are making
14+
Many access data that is available online, though most require authentication.
15+
16+
Some make use of very large files that are available.
17+
These can be launched from within the [TERRA REF Sensor Data Portal](https://terraref.ncsa.illinois.edu) (requires account / access).
18+
19+
###
20+
21+
22+
### Links
23+
24+
TODO: add links to quick-start documentation, README's, code for learning and applied examples
25+
26+
* Data portal: terraref.org/data
27+
* Docker Images on Docker Hub: hub.docker.com/terraref
28+
*
29+
### References
1430

15-
## Links
1631

17-
to quick-start documentation, README's, code for learning and applied examples
1832

1933
### Slides
2034

traits/00-BETYdb-getting-started.Rmd

Lines changed: 115 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,124 @@
1+
---
2+
title: "Getting Started with BETYdb"
3+
author: "David LeBauer"
4+
date: "`r Sys.Date()`"
5+
output: html_document
6+
---
17

2-
## Sign up
8+
```{r setup}
9+
library(traits)
10+
knitr::opts_chunk$set(echo = FALSE, cache = TRUE)
11+
library(ggplot2)
12+
library(ggthemes)
13+
library(GGally)
14+
theme_set(theme_bw())
15+
library(dplyr)
16+
```
317

4-
## Download from web interface
518

19+
## TERRA Ref Trait Database
620

21+
The TERRA Ref program uses the BETYdb database and web application software to store plant and plot level trait data.
722

8-
## API Summary
23+
### BETYdb: database software and web application
924

10-
API key sent in the mail
25+
The BETYdb software is actively used and developed by the [TERRA Reference](terraref.org) program as well as by the [PEcAn project](pecanproject.org).
1126

12-
Save it to textfile (take from traits/02- file
27+
For more information about BETYdb, see the following:
28+
29+
* BETYdb documentation (available via the web application under 'Docs')
30+
* _Data Access_
31+
* _Data Entry Workflow:_ how to add data to the database
32+
* _BETYdb Technical Documentation_ is written for advanced users and website and database administrators who may also be interested in the [full database schema](betydb.org/schemas)
33+
* BETYdb: A Yield, Trait and Ecosystem Service Database Applied to Second Generation Bioenergy Feedstocks. ([LeBauer et al, 2017](dx.doi.org/10.1111/gcbb.12420))
34+
35+
The TERRA REF trait database (terraref.ncsa.illinois.edu/bety) uses the BETYdb data schema (structure) and web application.
36+
There are at least a half-dozen other databases using the BETYdb software that these exercises will work with, though the results will depend on the available data.
37+
The first, betydb.org is described in LeBauer et al, 2017.
38+
Others are listed in the 'distributed BETYdb' section of the technical documentation.
39+
40+
One database, terraref.ncsa.illinois.edu/terra-test, houses a simulated dataset that is used in [lesson 1: A simulated data set](../traits/01-simulated-sorghum.Rmd) and does not require an account to access the data.
41+
BETYdb is only designed to keep the primary data private. Metadata such as field management and experimental design are available if the url is public.
42+
43+
## Getting an account for the TERRA trait database
44+
45+
* sign up for an account at terraref.ncsa.illinois.edu/bety
46+
* sign up for alpha user [link to form]
47+
* wait for database access to be granted
48+
* Your API key will be sent in the email; it can also be found - and regenerated - by navigating to 'data --> users' in the web interface
49+
50+
TODO add signup info from handout
51+
52+
## First steps: download data from web interface
53+
54+
TODO add steps to download csv from the web interface
55+
56+
Note that the web interface only provides a core set of data. More complex queries, such as those in the [Agronomic metadata](../traits/04-agronomic-metadata.Rmd)
57+
58+
## Advanced: Using URLs to construct Queries
59+
60+
The first step toward reproducible pipelines is to automate the process of searching the database and returning results. This is one of the key roles of an Application programming interface, or 'API'. You can learn to use the API in less than 20 minutes, starting now.
61+
62+
### What is an API?
63+
64+
An API is ...
65+
66+
### Using Your API key to Connect
67+
68+
An API key is like a password. It allows you to access data, and should be kept private.
69+
Therefore, we are not going to put it in code that we share. The one exception is the key 9999999999999999999999999999999999999999 that will allow you to access metadata tables (all tables except _traits_ and _yields_).
70+
71+
A common way of handling private API keys is to place it in a text file in your home directory.
72+
Don't put it in a project directory where it might be inadvertently shared.
73+
74+
Here is how to find and save your API key:
75+
76+
* click file --> new --> text file
77+
* copy the api key that was sent when you registered into the file
78+
* file --> save as '~/.betykey'
79+
80+
Equivalently in R `r writeLines('9999999999999999999999999999999999999999', con = '~/.betykey')` or at the command line `sh echo 9999999999999999999999999999999999999999 > ~/.betykey`
81+
82+
For the purposes of the tutorial, you can assign it to the `betykey` variable in the console window.
83+
84+
### Constructing a URL query
85+
86+
First, lets construct a query by putting together a URL.
87+
88+
1. start with the database url: `terraref.ncsa.illinois.edu/bety`
89+
* this url brings you to the home page
90+
2. Add the path to the API, `/api/beta`
91+
* now we have terraref.ncsa.illinois.edu/bety/api/beta, which points to the API documentation
92+
3. Add the name of the table you want to query. Lets start with `variables`
93+
* terraref.ncsa.illinois.edu/bety/api/beta/variables
94+
4. add query terms by appending a `?` and combining with `&`, for example:
95+
* `key=9999999999999999999999999999999999999999`
96+
* `type=trait` where the variable type is 'trait'
97+
* `name=~height` where the variable name contains 'height'
98+
5. This is your complete query:
99+
* `terraref.ncsa.illinois.edu/bety/api/beta/variables?type=trait&name=~height&key=9999999999999999999999999999999999999999`
100+
101+
**Your Turn**
102+
103+
> write a URL that will query the database for sites with "Field Scanner" in the name field. Hint: combine two terms with a `+` as in `Field+Scanner`
104+
105+
What do you see? Do you think that this is all of the records? What happens if you add `&limit=none`?
106+
107+
### Using the R traits package to query the database
108+
109+
The rOpenSci traits package makes it easier to query the TERRA REF trait database, or any database that uses BETYdb software.
110+
111+
```{r traits}
112+
113+
terraref_betyurl <- "https://terraref.ncsa.illinois.edu/bety/"
114+
betykey <- readLines('~/.betykey', warn = FALSE)
115+
```
116+
117+
```{r}
118+
sorghum_all <- betydb_search(query = 'Sorghum',
119+
betyurl = terraref_betyurl,
120+
key = betykey)
121+
122+
```
13123

14-
### Basic construction of an API query
15124

16-
Move this from exiting documentation.

traits/01-simulated-sorghum.Rmd

Lines changed: 19 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ output: html_document
66
---
77

88
```{r setup}
9+
library(traits)
910
knitr::opts_chunk$set(echo = FALSE, cache = TRUE)
1011
library(ggplot2)
1112
library(ggthemes)
@@ -14,9 +15,14 @@ theme_set(theme_bw())
1415
library(dplyr)
1516
```
1617

17-
# Background: The design of a simulated dataset
18+
# Working with a simulated dataset?
19+
20+
To explore the potential of phenotyping data, we have simulated the type of data that might be observed by daily scans by drone or robot. These data are freely accessible, and thus useful for teaching and exploration that does not require access.
21+
22+
## Methods:
23+
24+
### The design of a simulated dataset
1825

19-
To explore the potential of phenotyping data, we have simulated the type of data that might be observed by daily scans by drone or robot.
2026

2127
We have simulated 500 genotypes across 12 sites and five years using a mechanistic model.
2228

@@ -26,19 +32,19 @@ All of these simulated datasets are released with an unrestrive [copyright](http
2632

2733
### A note on variable names
2834

29-
I have used the variable names currently used in BETYdb.org/variables, along with names inspired by the more standardized naming Climate Forecasting conventions. However, this is a very early pre-release, and we welcome comments on how such data should be formatted and accessed can be discussed on GitHub.
35+
The variable names are those currently used in BETYdb.org/variables, along with names inspired by the more standardized naming Climate Forecasting conventions. However, this is a very early pre-release, and we welcome comments on how such data should be formatted and accessed can be discussed on GitHub.
3036

31-
> Exercise: can you locate the relevant issues?
37+
> Exercise: can you locate the GitHub issues that discuss trait data standards and variable naming conventions?
3238
3339
[This is a slideshow](https://docs.google.com/presentation/d/10aN_5whs8y9SOC8Y9Rj1kbWCfi7YG3yFyNqVX4JUr2U/edit?usp=sharing) of interfaces from the broader community that could serve as a common interface.
3440

35-
# Design of Simulation Experiment
41+
### Design of Simulation Experiment
3642

3743
500 Sorghum lines grown at each of three sites, four blocks per site, along a N-S transect in Illinois over five years (2021-2025).
3844

39-
## Time Span (2021-2025)
45+
#### Time Span (2021-2025)
4046

41-
These are historic data, but the years have been changed to emphasize the point that these are not real data. The years have been chosen to select climate extremes. Two years were dry, two were wet, and one was average.
47+
While the climate data are derived from historic data, the years have been changed to reinforce the fact that these are not real data. The years have been chosen to select climate extremes. Two years were dry, two were wet, and one was average.
4248

4349
| year | drought index |
4450
|-----|-----|
@@ -71,11 +77,11 @@ Each site has four replicate fields: A, B, C, D. This simulated dataset assumes
7177
| | precipitation | precipitation_flux | mm/d |Daily precipitation |
7278

7379

74-
## Genotypes
80+
### Genotypes
7581

7682
Two-hundred and twenty-seven lines were grown at each site. Each line is identified by a unique integer in the range [9915:10141]
7783

78-
## Phenotypes
84+
### Phenotypes
7985

8086
The phenotypes associated with each genotype is in the file `phenotypes.csv`.
8187

@@ -117,36 +123,21 @@ This dataset includes what a sensor might observe, daily for five years during t
117123
| | Height | canopy_height | m | |
118124

119125

120-
# Quick start
121-
122-
123-
For simplicity, and because I neither have a model nor the data to simulate _Sorghum_, I have started with some phenotypes and simulations of plant growth based on a model that simulates the growth of biomass crops including Miscanthus, Switchgrass, Sugarcane, and coppice Willow.
124126

125-
## Some background / methods.
126-
127-
I start with simulation of Miscanthus, over Illinois as a proxy for Sorghum, since I have a model, [BioCro, Miguez et al, 2009](github.com/ebimodeling/biocro), that simulates Miscanthus. [The code used to run the model and add noise is on GitHub](https://github.com/ebimodeling/biocro_regional/edit/master/vignettes/regional_pecan_workflow.Rmd).
128-
Like Sorghum, Miscanthus uses C4 photosynthesis, and this is used to compute carbon uptake at hourly time steps in the simulation model.
129-
Unlike Sorghum, Miscanthus grows clonally and is propagated by Rhizome instead of by seed. Furthermore, Miscanthus is perennial: it re-grows each year from carbon stored in rhizomes.
127+
I start with simulation over Illinois using another grass with C4 photosynthesis, [BioCro, Miguez et al, 2009](github.com/ebimodeling/biocro). [The code used to run the model and add noise is on GitHub](https://github.com/ebimodeling/biocro_regional/edit/master/vignettes/regional_pecan_workflow.Rmd).
128+
The model computes carbon uptake at hourly time steps.
130129
The 'genotypes' are based on five-hundred quasi-random parameterizations of a biophysical crop model.
131130

132-
The virtual "Miscanthus" is encoded as a set of associated species and prior estimates of the phenotypes used to parameterize the simulation model. The concept of Plant Functional Type (PFT) is any group of one or more plants, originally functionally related species such as 'C4 crops' or 'hardwood trees'. We use PFT at the finest scale of genotype. In the actual research that generated these data, this was the clone _Miscanthus x giganteus_, and this is [PFT #123 in BETYdb](https://www.betydb.org/pfts/123).
133-
134-
## A Simulated Sorghum Breeding Population
135-
136-
In make-believe land we take a bunch of data generated above and start to rename things
131+
The virtual crop is encoded as a set of associated species and prior estimates of the phenotypes used to parameterize the simulation model.
137132

138-
* run_id -> genotype #represents quasi-random set of traits
139-
* lat, lon -> To estimate "E"
133+
* lat, lon -> To estimate "Environmental effects"
140134
* for within site E effects, use a few points within 1/4 degree lat/lon
141135
* for across site E effects, use southern, central, and northern Illinois.
142136

143137
### Accessing the TERRA Simulated Data Database
144138

145139

146140
```{r}
147-
148-
library(traits)
149-
150141
terraref_test_url <- "https://terraref.ncsa.illinois.edu/bety-test/"
151142
## note that this key accesses public data. In part 2 you will use your own key to access the actual data
152143

traits/02-danforth-phenotyping-facility.Rmd

Lines changed: 0 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -15,35 +15,6 @@ theme_set(theme_bw())
1515
library(traits)
1616
```
1717

18-
19-
## BETYdb
20-
21-
BETYdb stores trait data and agronomic metadata. An introduction to the API is in ../betydb.md
22-
23-
Also see the full documentation for accessing data from BETYdb.
24-
25-
26-
### Setting up an API key and establishing a connection
27-
28-
An API key is like a password. It allows you to access data, and should be kept private. Therefore, we are not going to put it in code that we share. One way to do this is to place it in a simple text file.
29-
30-
* click file --> new --> text file
31-
* copy the api key that was sent when you registered into the file
32-
* file --> save save as '/home/rstudio/.betykey'
33-
34-
For the purposes of the tutorial, you can assign it to the `mykey` variable in the console window.
35-
36-
```{r traits}
37-
38-
terraref_betyurl <- "https://terraref.ncsa.illinois.edu/bety/"
39-
mykey <- readLines('~/.betykey', warn = FALSE)
40-
41-
sorghum_all <- betydb_search(query = 'Sorghum',
42-
betyurl = terraref_betyurl,
43-
key = mykey)
44-
45-
```
46-
4718
### Query data from the Danforth Phenotyping Facility
4819

4920
First we will use the generic search to query the output from the Lemnatec indoor phenotyping system at the Danforth Center in St. Louis, MO.

traits/03-maricopa-field-scanner.Rmd

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
1-
1+
---
2+
title: "Plot level data from the field scanner in Maricopa, AZ"
3+
author: "David LeBauer, Chris Black"
4+
date: "`r Sys.Date()`"
5+
output: md_document
6+
---
27
```{r setup, include=FALSE}
38
knitr::opts_chunk$set(echo = FALSE, cache = TRUE)
49
library(dplyr)

0 commit comments

Comments
 (0)