Skip to content

Commit 7f0a448

Browse files
committed
updates - got simulated data tutorial to work; output to /docs
1 parent 13a49dc commit 7f0a448

4 files changed

Lines changed: 307 additions & 159 deletions

File tree

_bookdown.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
book_filename: "terraref-tutorials"
2+
output_dir: "book-output"
23
language:
34
ui:
45
chapter_name: "Chapter "

traits/00-BETYdb-getting-started.Rmd

Lines changed: 110 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -11,23 +11,23 @@ The TERRA Ref program uses the BETYdb database and web application software to s
1111

1212
### BETYdb: database software and web application
1313

14+
The TERRA REF trait database (terraref.ncsa.illinois.edu/bety) uses the BETYdb data schema (structure) and web application.
1415
The BETYdb software is actively used and developed by the [TERRA Reference](terraref.org) program as well as by the [PEcAn project](pecanproject.org).
1516

1617
For more information about BETYdb, see the following:
1718

1819
* BETYdb documentation (available via the web application under 'Docs')
19-
* _Data Access_
20+
* _Data Access_: how to access data
2021
* _Data Entry Workflow:_ how to add data to the database
2122
* _BETYdb Technical Documentation_ is written for advanced users and website and database administrators who may also be interested in the [full database schema](betydb.org/schemas)
2223
* BETYdb: A Yield, Trait and Ecosystem Service Database Applied to Second Generation Bioenergy Feedstocks. ([LeBauer et al, 2017](dx.doi.org/10.1111/gcbb.12420))
2324

24-
The TERRA REF trait database (terraref.ncsa.illinois.edu/bety) uses the BETYdb data schema (structure) and web application.
2525
There are at least a half-dozen other databases using the BETYdb software that these exercises will work with, though the results will depend on the available data.
2626
The first, betydb.org is described in LeBauer et al, 2017.
2727
Others are listed in the 'distributed BETYdb' section of the technical documentation.
2828

29-
One database, terraref.ncsa.illinois.edu/terra-test, houses a simulated dataset that is used in [lesson 1: A simulated data set](../traits/01-simulated-sorghum.Rmd) and does not require an account to access the data.
30-
BETYdb is only designed to keep the primary data private. Metadata such as field management and experimental design are available if the url is public.
29+
When there is a public-facing website, BETYdb is only designed to keep its trait and yield data private.
30+
Metadata such as field management and experimental design are available if the url is public.
3131

3232
## Getting an account for the TERRA trait database
3333

@@ -40,22 +40,30 @@ TODO add signup info from handout
4040

4141
## First steps: download data from web interface
4242

43-
TODO add steps to download csv from the web interface
43+
* Point your browser to terraref.ncsa.illinois.edu/bety
44+
* login
45+
* enter "NDVI" in the search box
46+
* on the next page you will see the results of this search
47+
* if you want all of the data, including data that has not gone through QA/QC, make sure to check the 'include unchecked records' option
48+
* in the upper right, you will see a button that will allow you to download the search results as a CSV file. Click it. Open the file in a text editor or spreadsheet program and review its contents.
4449

45-
Note that the web interface only provides a core set of data. More complex queries, such as those in the [Agronomic metadata](../traits/04-agronomic-metadata.Rmd)
50+
Note that the web interface only provides a core set of data and limited meta-data. To access all of the data within BETYdb, it is necessary to search and merge multiple tables. More complex queries, such as those in the [Agronomic metadata](../traits/04-agronomic-metadata.Rmd).
4651

4752
## Advanced: Using URLs to construct Queries
4853

4954
The first step toward reproducible pipelines is to automate the process of searching the database and returning results. This is one of the key roles of an Application programming interface, or 'API'. You can learn to use the API in less than 20 minutes, starting now.
5055

5156
### What is an API?
5257

53-
An API is ...
58+
An API is an 'Application Programming Interface'. An API is a way that you and your software can connect to and access data.
59+
60+
All of our databases have web interfaces for humans to browse as well as APIs that are constructed as URLs.
61+
5462

5563
### Using Your API key to Connect
5664

5765
An API key is like a password. It allows you to access data, and should be kept private.
58-
Therefore, we are not going to put it in code that we share. The one exception is the key 9999999999999999999999999999999999999999 that will allow you to access metadata tables (all tables except _traits_ and _yields_).
66+
Therefore, we are not going to put it in code that we share. The one exception is the key 9999999999999999999999999999999999999999 that will allow you to access metadata tables (all tables except _traits_ and _yields_). It will also allow you to access all of the simulated data in the terraref.ncsa.illinois.edu/bety-test database.
5967

6068
A common way of handling private API keys is to place it in a text file in your home directory.
6169
Don't put it in a project directory where it might be inadvertently shared.
@@ -66,9 +74,17 @@ Here is how to find and save your API key:
6674
* copy the api key that was sent when you registered into the file
6775
* file --> save as '~/.betykey'
6876

69-
Equivalently in R `r writeLines('9999999999999999999999999999999999999999', con = '~/.betykey')` or at the command line `sh echo 9999999999999999999999999999999999999999 > ~/.betykey`
77+
For the public key, you can call this file `~/.betykey_public`.
78+
79+
### Components of a URL query
80+
81+
82+
* base url: `terraref.ncsa.illinois.edu/bety`
83+
* path to the api: `/api/beta`
84+
* api endpoint: `/search` or `traits` or `sites`. For BETYdb, these are the names of database tables.
85+
* Query parameters: `genus=Sorghum`
86+
* Authentication: `key=9999999999999999999999999999999999999999` is the public key for the TERRA REF traits database.
7087

71-
For the purposes of the tutorial, you can assign it to the `betykey` variable in the console window.
7288

7389
### Constructing a URL query
7490

@@ -86,27 +102,68 @@ First, lets construct a query by putting together a URL.
86102
* `name=~height` where the variable name contains 'height'
87103
5. This is your complete query:
88104
* `terraref.ncsa.illinois.edu/bety/api/beta/variables?type=trait&name=~height&key=9999999999999999999999999999999999999999`
105+
* it will query all variables that are type trait and have 'height' in the name
106+
* Does it return the expected values?
89107

90-
**Your Turn**
108+
109+
#### Your Turn
110+
111+
> What will the URL https://terraref.ncsa.illinois.edu/bety/api/beta/species?genus=Sorghum&key=9999999999999999999999999999999999999999 return?
91112
92113
> write a URL that will query the database for sites with "Field Scanner" in the name field. Hint: combine two terms with a `+` as in `Field+Scanner`
93114
94115
What do you see? Do you think that this is all of the records? What happens if you add `&limit=none`?
95116

117+
### Our first Query
118+
119+
#### Shell
120+
121+
```sh
122+
wget -O sorghum.json \\ # -O names the output file
123+
"https://terraref.ncsa.illinois.edu/bety/api/beta/species?genus=Sorghum&key=999999999999999999999999999999999999
124+
9999"
125+
```
126+
127+
If you want to write the query without exposing the key in plain text, you can construct it thus:
128+
129+
```sh
130+
wget -O sorghum.json \\
131+
"https://terraref.ncsa.illinois.edu/bety/api/beta/species?genus=Sorghum&key=`cat ~/.betykey_public`"
132+
```
133+
134+
> What does `cat ~/.betykey_public` do?
135+
136+
> How can you look at the files?
137+
138+
139+
#### R - using the jsonlite package
140+
141+
```{r text-api}
142+
sorghum.json <- readLines(
143+
paste0("https://terraref.ncsa.illinois.edu/bety/api/beta/species?genus=Sorghum&key=",
144+
readLines('~/.betykey')))
145+
146+
## print(sorghum.json)
147+
## not a particularly useful format
148+
## lets convert to a data frame
149+
sorghum <- jsonlite::fromJSON(sorghum.json)
150+
```
151+
96152
## Using the R traits package to query the database
97153

98154
The rOpenSci traits package makes it easier to query the TERRA REF trait database, or any database that uses BETYdb software.
99155

100-
First, make sure we have the latest version
156+
First, make sure we have the latest version from the terraref fork of the repository on github. (you can install using the standard `install.packages('traits')` but I can't promise everything will work.
157+
158+
### Install the package
101159

102160
```{r install_traits, echo=FALSE}
103-
if(packageVersion("traits") == '0.2.0'){
104-
devtools::install_github('ropensci/traits')
105-
}
161+
devtools::install_github('terraref/traits')
106162
```
107163

164+
Now, we can load the packages that we will need to get started.
108165

109-
```{r setup}
166+
```{r 00-setup}
110167
library(traits)
111168
knitr::opts_chunk$set(echo = FALSE, cache = TRUE)
112169
library(ggplot2)
@@ -126,49 +183,63 @@ library(dplyr)
126183
writeLines('9999999999999999999999999999999999999999',
127184
con = '~/.betykey_public')
128185
```
129-
```{r traits}
130-
terraref_test_url <-
131-
public_betykey <-
132186

133-
```
187+
#### R - using the traits package
134188

135-
### Our first Query
189+
The R traits package is an API 'client'. It does two important things:
190+
1. It makes it easier to specify the query parameters without having to construct a URL
191+
2. It returns the results as a data frame, which is easier to use within R
192+
193+
Lets start with the query of information about Sorghum from species table from above
136194

137-
```{r simulated-LAI}
138-
sorghum_lai <- betydb_query(table = 'search',
139-
trait = "LAI",
195+
```{r query-species}
196+
197+
sorghum_info <- betydb_query(table = 'species',
198+
genus = "Sorghum",
140199
api_version = 'beta',
141-
limit = 5000,
142-
betyurl = "https://terraref.ncsa.illinois.edu/bety-test/",
143-
key = readLines('~/.betykey_public', warn = FALSE))
200+
limit = 'none',
201+
betyurl = "https://terraref.ncsa.illinois.edu/bety/",
202+
key = readLines('~/.betykey', warn = FALSE))
144203
145204
```
146205

147-
Notice all of the arguments? We can change this by setting the default options
206+
#### R - setting options for the traits package
207+
208+
Notice all of the arguments that the `betydb_query` function requires? We can change this by setting the default connection options thus:
148209

149210

150211
```{r}
151-
options(betydb_key = readLines('~/.betykey_public', warn = FALSE),
152-
betydb_url = "https://terraref.ncsa.illinois.edu/bety-test/",
212+
options(betydb_key = readLines('~/.betykey', warn = FALSE),
213+
betydb_url = "https://terraref.ncsa.illinois.edu/bety/",
153214
betydb_api_version = 'beta')
154215
```
155216

156217
Now the same query can be reduced to:
157218

158-
```{r eval=FALSE}
159-
sorghum_lai <- betydb_query(table = 'search',
160-
trait = "LAI",
161-
limit = 5000)
219+
```{r sv_area}
220+
sorghum_height <- betydb_query(table = 'search',
221+
trait = "plant_height",
222+
site = "~MAC",
223+
api_version = 'beta',
224+
limit = 'none',
225+
betyurl = "https://terraref.ncsa.illinois.edu/bety/",
226+
key = readLines('~/.betykey', warn = FALSE))
162227
```
163228

229+
### Time series of height
230+
231+
Now we can take a look at the data that we have just queried.
164232

165233
```{r}
166-
ggplot(data = sorghum_lai) +
167-
geom_smooth(aes(x = lubridate::yday(lubridate::ymd_hms(raw_date)), y = mean, color = as.factor(lubridate::year(lubridate::ymd_hms(raw_date)))), span = 0.5) +
234+
ggplot(data = sorghum_height,
235+
aes(x = lubridate::yday(lubridate::ymd_hms(raw_date)), y = mean, color = cultivar)) +
236+
geom_smooth(se = FALSE, size = 0.5) +
237+
geom_point(size = 0.5, position = position_jitter(width = 0.1)) +
168238
# scale_x_datetime(date_breaks = '6 months', date_labels = "%b %Y") +
169-
ylim(c(0,6)) +
170-
ylab("Day of Year") + xlab("Leaf Area Index") +
171-
labs(color='Year')
239+
# ylim(c(0,6)) +
240+
xlab("Day of Year") + ylab("Plant Height") +
241+
guides(color = guide_legend(title = 'Genotype')) +
242+
theme_bw()
172243
173244
```
174245

0 commit comments

Comments
 (0)