revising README and moving content around within traits tutorials

dlebauer · dlebauer · commit 76324af21bd7 · 2017-03-28T09:26:46.000-05:00
diff --git a/README.md b/README.md
@@ -1,20 +1,34 @@
 # Tutorials
 
-Learn to use TERRA REF data and software
+## An introduction to the use of TERRA REF data and software
 
-Note that many of these tutorials have complex dependencies. These can be launched from within the [TERRA REF Sensor Data Portal](https://terraref.ncsa.illinois.edu) (requires account / access). 
+Many of these tutorials have complex dependencies. 
 
-Also try:
+### Data Access
 
-* mybinder.org/repo/terraref/rstudio (for R tutorials under traits/)
-* mybinder.org/repo/terraref/rstudio-geospatial (for R tutorials under sensors/)
-* mybinder.org/repo/terraref/jupyter-plantcv (for PlantCV tutorial)
-* mybinder.org/repo/terraref/jupyter-netcdf (Python tutorials under sensors/)
+The first research-grade version of TERRA REF data products will be released in November 2018. 
+Before that, we will make evaluation releases available: the alpha version was released in November 2016 and the beta version will be released in 2017.
 
+make the data available for evaluation with the goal of receiving feedback.
+are making 
+Many access data that is available online, though most require authentication.
+
+Some make use of very large files that are available. 
+These can be launched from within the [TERRA REF Sensor Data Portal](https://terraref.ncsa.illinois.edu) (requires account / access).
+
+### 
+
+
+### Links
+
+TODO: add links to quick-start documentation, README's, code for learning and applied examples
+
+* Data portal: terraref.org/data
+* Docker Images on Docker Hub: hub.docker.com/terraref
+* 
+### References
 
-## Links
 
-to quick-start documentation, README's, code for learning and applied examples
 
 ### Slides
 
diff --git a/traits/00-BETYdb-getting-started.Rmd b/traits/00-BETYdb-getting-started.Rmd
@@ -1,16 +1,124 @@
+---
+title: "Getting Started with BETYdb"
+author: "David LeBauer"
+date: "`r Sys.Date()`"
+output: html_document
+---
 
-## Sign up
+```{r setup}
+library(traits)
+knitr::opts_chunk$set(echo = FALSE, cache = TRUE)
+library(ggplot2)
+library(ggthemes)
+library(GGally)
+theme_set(theme_bw())
+library(dplyr)
+```
 
-## Download from web interface
 
+## TERRA Ref Trait Database
 
+The TERRA Ref program uses the BETYdb database and web application software to store plant and plot level trait data. 
 
-## API Summary
+### BETYdb: database software and web application
 
-API key sent in the mail
+The BETYdb software is actively used and developed by the [TERRA Reference](terraref.org) program as well as by the [PEcAn project](pecanproject.org).
 
-Save it to textfile (take from traits/02- file
+For more information about BETYdb, see the following:
+
+* BETYdb documentation (available via the web application under 'Docs')
+  * _Data Access_
+  * _Data Entry Workflow:_ how to add data to the database
+  * _BETYdb Technical Documentation_ is written for advanced users and website and database administrators who may also be interested in the [full database schema](betydb.org/schemas)
+* BETYdb: A Yield, Trait and Ecosystem Service Database Applied to Second Generation Bioenergy Feedstocks. ([LeBauer et al, 2017](dx.doi.org/10.1111/gcbb.12420))
+
+The TERRA REF trait database (terraref.ncsa.illinois.edu/bety) uses the BETYdb data schema (structure) and web application.
+There are at least a half-dozen other databases using the BETYdb software that these exercises will work with, though the results will depend on the available data.
+The first, betydb.org is described in LeBauer et al, 2017.
+Others are listed in the 'distributed BETYdb' section of the technical documentation.
+
+One database, terraref.ncsa.illinois.edu/terra-test, houses a simulated dataset that is used in [lesson 1: A simulated data set](../traits/01-simulated-sorghum.Rmd) and does not require an account to access the data.
+BETYdb is only designed to keep the primary data private. Metadata such as field management and experimental design are available if the url is public.
+
+## Getting an account for the TERRA trait database
+
+* sign up for an account at terraref.ncsa.illinois.edu/bety
+* sign up for alpha user [link to form]
+* wait for database access to be granted
+* Your API key will be sent in the email; it can also be found - and regenerated - by navigating to 'data --> users' in the web interface
+
+TODO add signup info from handout
+
+## First steps: download data from web interface
+
+TODO add steps to download csv from the web interface
+
+Note that the web interface only provides a core set of data. More complex queries, such as those in the [Agronomic metadata](../traits/04-agronomic-metadata.Rmd)
+
+## Advanced: Using URLs to construct Queries
+
+The first step toward reproducible pipelines is to automate the process of searching the database and returning results. This is one of the key roles of an Application programming interface, or 'API'. You can learn to use the API in less than 20 minutes, starting now. 
+
+### What is an API?
+
+An API is ...
+
+### Using Your API key to Connect
+
+An API key is like a password. It allows you to access data, and should be kept private. 
+Therefore, we are not going to put it in code that we share. The one exception is the key 9999999999999999999999999999999999999999 that will allow you to access metadata tables (all tables except _traits_ and _yields_). 
+
+A common way of handling private API keys is to place it in a text file in your home directory. 
+Don't put it in a project directory where it might be inadvertently shared.
+
+Here is how to find and save your API key:
+
+* click file --> new --> text file
+* copy the api key that was sent when you registered into the file
+* file --> save as '~/.betykey'
+
+Equivalently in R `r writeLines('9999999999999999999999999999999999999999', con = '~/.betykey')` or at the command line `sh echo 9999999999999999999999999999999999999999 > ~/.betykey`
+
+For the purposes of the tutorial, you can assign it to the `betykey` variable in the console window.
+
+### Constructing a URL query
+
+First, lets construct a query by putting together a URL.
+
+1. start with the database url: `terraref.ncsa.illinois.edu/bety`
+  * this url brings you to the home page
+2. Add the path to the API, `/api/beta`
+  * now we have terraref.ncsa.illinois.edu/bety/api/beta, which points to the API documentation
+3. Add the name of the table you want to query. Lets start with `variables`
+  * terraref.ncsa.illinois.edu/bety/api/beta/variables
+4. add query terms by appending a `?` and combining with `&`, for example:
+  * `key=9999999999999999999999999999999999999999`
+  * `type=trait` where the variable type is 'trait'
+  * `name=~height` where the variable name contains 'height'
+5. This is your complete query:
+  * `terraref.ncsa.illinois.edu/bety/api/beta/variables?type=trait&name=~height&key=9999999999999999999999999999999999999999`
+  
+**Your Turn**
+
+> write a URL that will query the database for sites with "Field Scanner" in the name field. Hint: combine two terms with a `+` as in `Field+Scanner`
+
+What do you see? Do you think that this is all of the records? What happens if you add `&limit=none`? 
+
+### Using the R traits package to query the database
+
+The rOpenSci traits package makes it easier to query the TERRA REF trait database, or any database that uses BETYdb software.
+
+```{r traits}
+
+terraref_betyurl <- "https://terraref.ncsa.illinois.edu/bety/"
+betykey <- readLines('~/.betykey', warn = FALSE)
+```
+
+```{r}
+sorghum_all <- betydb_search(query = 'Sorghum', 
+                         betyurl = terraref_betyurl, 
+                         key = betykey) 
+
+```
 
-### Basic construction of an API query
 
-Move this from exiting documentation.
diff --git a/traits/01-simulated-sorghum.Rmd b/traits/01-simulated-sorghum.Rmd
@@ -6,6 +6,7 @@ output: html_document
 ---
 
 ```{r setup}
+library(traits)
 knitr::opts_chunk$set(echo = FALSE, cache = TRUE)
 library(ggplot2)
 library(ggthemes)
@@ -14,9 +15,14 @@ theme_set(theme_bw())
 library(dplyr)
 ```
 
-# Background: The design of a simulated dataset
+# Working with a simulated dataset?
+
+To explore the potential of phenotyping data, we have simulated the type of data that might be observed by daily scans by drone or robot. These data are freely accessible, and thus useful for teaching and exploration that does not require access.
+
+## Methods: 
+
+### The design of a simulated dataset
 
-To explore the potential of phenotyping data, we have simulated the type of data that might be observed by daily scans by drone or robot. 
 
 We have simulated 500 genotypes across 12 sites and five years using a mechanistic model. 
 
@@ -26,19 +32,19 @@ All of these simulated datasets are released with an unrestrive [copyright](http
 
 ### A note on variable names
 
-I have used the variable names currently used in BETYdb.org/variables, along with names inspired by the more standardized naming Climate Forecasting conventions. However, this is a very early pre-release, and we welcome comments on how such data should be formatted and accessed can be discussed on GitHub.
+The variable names are those currently used in BETYdb.org/variables, along with names inspired by the more standardized naming Climate Forecasting conventions. However, this is a very early pre-release, and we welcome comments on how such data should be formatted and accessed can be discussed on GitHub.
 
-> Exercise: can you locate the relevant issues?
+> Exercise: can you locate the GitHub issues that discuss trait data standards and variable naming conventions?
 
 [This is a slideshow](https://docs.google.com/presentation/d/10aN_5whs8y9SOC8Y9Rj1kbWCfi7YG3yFyNqVX4JUr2U/edit?usp=sharing) of interfaces from the broader community that could serve as a common interface.
 
-# Design of Simulation Experiment
+### Design of Simulation Experiment
 
 500 Sorghum lines grown at each of three sites, four blocks per site, along a N-S transect in Illinois over five years (2021-2025). 
 
-## Time Span (2021-2025) 
+#### Time Span (2021-2025) 
 
-These are historic data, but the years have been changed to emphasize the point that these are not real data. The years have been chosen to select climate extremes. Two years were dry, two were wet, and one was average.
+While the climate data are derived from historic data, the years have been changed to reinforce the fact that these are not real data. The years have been chosen to select climate extremes. Two years were dry, two were wet, and one was average.
 
 | year | drought index |
 |-----|-----|
@@ -71,11 +77,11 @@ Each site has four replicate fields: A, B, C, D. This simulated dataset assumes
 |            | precipitation                    | precipitation_flux | mm/d                        |Daily precipitation |
 
 
-## Genotypes
+### Genotypes
 
 Two-hundred and twenty-seven lines were grown at each site. Each line is identified by a unique integer in the range [9915:10141]
 
-## Phenotypes
+### Phenotypes
 
 The phenotypes associated with each genotype is in the file `phenotypes.csv`. 
 
@@ -117,36 +123,21 @@ This dataset includes what a sensor might observe, daily for five years during t
 |            | Height                               |   canopy_height | m                        | |
 
 
-# Quick start
-
-
-For simplicity, and because I neither have a model nor the data to simulate _Sorghum_, I have started with some phenotypes and simulations of plant growth based on a model that simulates the growth of biomass crops including Miscanthus, Switchgrass, Sugarcane, and coppice Willow.
 
-## Some background / methods.
-
-I start with simulation of Miscanthus, over Illinois as a proxy for Sorghum, since I have a model, [BioCro, Miguez et al, 2009](github.com/ebimodeling/biocro), that simulates Miscanthus. [The code used to run the model and add noise is on GitHub](https://github.com/ebimodeling/biocro_regional/edit/master/vignettes/regional_pecan_workflow.Rmd). 
-Like Sorghum, Miscanthus uses C4 photosynthesis, and this is used to compute carbon uptake at hourly time steps in the simulation model. 
-Unlike Sorghum, Miscanthus grows clonally and is propagated by Rhizome instead of by seed. Furthermore, Miscanthus is perennial: it re-grows each year from carbon stored in rhizomes.
+I start with simulation over Illinois using another grass with C4 photosynthesis, [BioCro, Miguez et al, 2009](github.com/ebimodeling/biocro). [The code used to run the model and add noise is on GitHub](https://github.com/ebimodeling/biocro_regional/edit/master/vignettes/regional_pecan_workflow.Rmd). 
+The model computes carbon uptake at hourly time steps.
 The 'genotypes' are based on five-hundred quasi-random parameterizations of a biophysical crop model.
 
-The virtual "Miscanthus" is encoded as a set of associated species and prior estimates of the phenotypes used to parameterize the simulation model. The concept of Plant Functional Type (PFT) is any group of one or more plants, originally functionally related species such as 'C4 crops' or 'hardwood trees'. We use PFT at the finest scale of genotype. In the actual research that generated these data, this was the clone _Miscanthus x giganteus_, and this is [PFT #123 in BETYdb](https://www.betydb.org/pfts/123). 
-
-## A Simulated Sorghum Breeding Population
-
-In make-believe land we take a bunch of data generated above and start to rename things
+The virtual crop is encoded as a set of associated species and prior estimates of the phenotypes used to parameterize the simulation model. 
 
-* run_id -> genotype #represents quasi-random set of traits
-* lat, lon -> To estimate "E"
+* lat, lon -> To estimate "Environmental effects"
    * for within site E effects, use a few points within 1/4 degree lat/lon 
    * for across site E effects, use southern, central, and northern Illinois.
 
 ### Accessing the TERRA Simulated Data Database
 
 
 ```{r}
-
-library(traits)
-
 terraref_test_url <- "https://terraref.ncsa.illinois.edu/bety-test/"
 ## note that this key accesses public data. In part 2 you will use your own key to access the actual data
 
diff --git a/traits/02-danforth-phenotyping-facility.Rmd b/traits/02-danforth-phenotyping-facility.Rmd
@@ -15,35 +15,6 @@ theme_set(theme_bw())
 library(traits)
 ```
 
-
-## BETYdb
-
-BETYdb stores trait data and agronomic metadata. An introduction to the API is in ../betydb.md
-
-Also see the full documentation for accessing data from BETYdb. 
-
-
-### Setting up an API key and establishing a connection
-
-An API key is like a password. It allows you to access data, and should be kept private. Therefore, we are not going to put it in code that we share. One way to do this is to place it in a simple text file. 
-
-* click file --> new --> text file
-* copy the api key that was sent when you registered into the file
-* file --> save save as '/home/rstudio/.betykey'
-
-For the purposes of the tutorial, you can assign it to the `mykey` variable in the console window.
-
-```{r traits}
-
-terraref_betyurl <- "https://terraref.ncsa.illinois.edu/bety/"
-mykey <- readLines('~/.betykey', warn = FALSE)
-
-sorghum_all <- betydb_search(query = 'Sorghum', 
-                         betyurl = terraref_betyurl, 
-                         key = mykey) 
-
-```
-
 ### Query data from the Danforth Phenotyping Facility
 
 First we will use the generic search to query the output from the Lemnatec indoor phenotyping system at the Danforth Center in St. Louis, MO.
diff --git a/traits/03-maricopa-field-scanner.Rmd b/traits/03-maricopa-field-scanner.Rmd
@@ -1,4 +1,9 @@
-
+---
+title: "Plot level data from the field scanner in Maricopa, AZ"
+author: "David LeBauer, Chris Black"
+date: "`r Sys.Date()`"
+output: md_document
+---
 ```{r setup, include=FALSE}
 knitr::opts_chunk$set(echo = FALSE, cache = TRUE)
 library(dplyr)