|
1 | | ---- |
2 | | -title: "Accessing Trait Data Via the BETYdb API" |
3 | | -author: "David LeBauer" |
4 | | -date: "11/7/2017" |
5 | | -output: html_document |
6 | | ---- |
7 | | - |
8 | | - |
9 | | -## Using URLs to construct Queries |
10 | | - |
11 | | -The first step toward reproducible pipelines is to automate the process of searching the database and returning results. This is one of the key roles of an Application programming interface, or 'API'. You can learn to use the API in less than 20 minutes, starting now. |
12 | | - |
13 | | -### What is an API? |
14 | | - |
15 | | -An API is an 'Application Programming Interface'. An API is a way that you and your software can connect to and access data. |
16 | | - |
17 | | -All of our databases have web interfaces for humans to browse as well as APIs that are constructed as URLs. |
18 | | - |
19 | | - |
20 | | -### Using Your API key to Connect |
21 | | - |
22 | | -An API key is like a password. It allows you to access data, and should be kept private. |
23 | | -Therefore, we are not going to put it in code that we share. The one exception is the key 9999999999999999999999999999999999999999 that will allow you to access metadata tables (all tables except _traits_ and _yields_). It will also allow you to access all of the simulated data in the https://terraref.ncsa.illinois.edu/bety-test database. |
24 | | - |
25 | | -A common way of handling private API keys is to place it in a text file in your current directory. |
26 | | -Don't put it in a project directory where it might be inadvertently shared. |
27 | | - |
28 | | -Here is how to find and save your API key: |
29 | | - |
30 | | -* click file --> new --> text file |
31 | | -* copy the api key that was sent when you registered into the file |
32 | | -* file --> save as '.betykey' |
33 | | - |
34 | | -For the public key, you can call this file `.betykey_public`. |
35 | | - |
36 | | -### Ways to access API data |
37 | | - |
38 | | -1. Through a URL query |
39 | | -2. Using the bash shell |
40 | | -3. Using the R jsonlite package |
41 | | - |
42 | | - |
43 | | -### Accessing data using a URL query |
44 | | - |
45 | | - |
46 | | -## Components of a URL query |
47 | | - |
48 | | -* base url: `terraref.ncsa.illinois.edu/bety` |
49 | | -* path to the api: `/api/v1` |
50 | | -* api endpoint: `/search` or `traits` or `sites`. For BETYdb, these are the names of database tables. |
51 | | -* Query parameters: `genus=Sorghum` |
52 | | -* Authentication: `key=9999999999999999999999999999999999999999` is the public key for the TERRA REF traits database. |
53 | | - |
54 | | -## Constructing a URL query |
55 | | - |
56 | | -First, lets construct a query by putting together a URL. |
57 | | - |
58 | | -1. start with the database url: `terraref.ncsa.illinois.edu/bety` |
59 | | - * this url brings you to the home page |
60 | | -2. Add the path to the API, `/api/v1` |
61 | | - * now we have terraref.ncsa.illinois.edu/bety/api/v1, which points to the API documentation |
62 | | -3. Add the name of the table you want to query. Lets start with `variables` |
63 | | - * terraref.ncsa.illinois.edu/bety/api/v1/variables |
64 | | -4. add query terms by appending a `?` and combining with `&`, for example: |
65 | | - * `key=9999999999999999999999999999999999999999` |
66 | | - * `type=trait` where the variable type is 'trait' |
67 | | - * `name=~height` where the variable name contains 'height' |
68 | | -5. This is your complete query: |
69 | | - * `terraref.ncsa.illinois.edu/bety/api/v1/variables?type=trait&name=~height&key=9999999999999999999999999999999999999999` |
70 | | - * it will query all variables that are type trait and have 'height' in the name |
71 | | - * Does it return the expected values? |
72 | | - |
73 | | -## Your Turn |
74 | | - |
75 | | -> What will the URL https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum&key=9999999999999999999999999999999999999999 return? |
76 | | -
|
77 | | -> Write a URL that will query the database for sites with "Field Scanner" in the name field. Hint: combine two terms with a `+` as in `Field+Scanner` |
78 | | -
|
79 | | -What do you see? Do you think that this is all of the records? What happens if you add `&limit=none`? |
80 | | - |
81 | | - |
82 | | - |
83 | | -#### Accessing data using the Shell |
84 | | - |
85 | | -Type the following command into a bash shell (the `-o` option names the output file): |
86 | | - |
87 | | -```sh |
88 | | -curl -o sorghum.json \ |
89 | | - "https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum&key=9999999999999999999999999999999999999999" |
90 | | -``` |
91 | | - |
92 | | -If you want to write the query without exposing the key in plain text, you can construct it like this: |
93 | | - |
94 | | -```sh |
95 | | -curl -o sorghum.json \ |
96 | | - "https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum&key=`cat .betykey_public`" |
97 | | -``` |
98 | | - |
99 | | -### Accessing API data using the R jsonlite package |
100 | | - |
101 | | -```{r text-api} |
102 | | -sorghum.json <- readLines( |
103 | | - paste0("https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum&key=", |
104 | | - readLines('.betykey'))) |
105 | | -
|
106 | | -## print(sorghum.json) |
107 | | -## not a particularly useful format |
108 | | -## lets convert to a data frame |
109 | | -sorghum <- jsonlite::fromJSON(sorghum.json) |
110 | | -``` |
111 | | - |
112 | | -More on how to use the rOpenSci traits package coming up in the [next tutorial](03-access-r-traits.Rmd) |
| 1 | +# Accessing Trait Data Via the BETYdb API |
| 2 | + |
| 3 | +This will teach you how to query trait data using a browser as well as using the command line tool `curl`. This interface is the primary way in which you can access data from the command line. |
| 4 | + |
| 5 | +## What is an API? |
| 6 | + |
| 7 | +An API is an 'Application Programming Interface'. An API is a way that you and your software can connect to and access data. |
| 8 | + |
| 9 | +All of our databases have web interfaces for humans to browse as well as APIs that are constructed as URLs. |
| 10 | + |
| 11 | +## Tutorial Contents |
| 12 | + |
| 13 | +In this tutorial, we will describe three ways to access data using: |
| 14 | + |
| 15 | +1. A URL typed into your browser |
| 16 | +2. The command line, or terminal |
| 17 | +3. The R jsonlite package |
| 18 | + |
| 19 | +We also have interfaces using R 'traits' package or the Python 'terrautils' package that return data in a more familiar and ready to analyze tabular format; these will be described later. You can skip ahead to those chapters, but this chapter will provide some insight into the methods that underlie those libraries. |
| 20 | + |
| 21 | +## Using URLs to construct Queries |
| 22 | + |
| 23 | +The first step toward reproducible pipelines is to automate the process of searching the database and returning results. This is one of the key roles of an Application programming interface, or 'API'. You can learn to use the API in less than 20 minutes, starting now. |
| 24 | + |
| 25 | + |
| 26 | +### Using Your API key to Connect |
| 27 | + |
| 28 | +An API key is like a password. It allows you to access data, and should be kept private. |
| 29 | +Therefore, we are not going to put it in code that we share. The one exception is the key 9999999999999999999999999999999999999999 that will allow you to access metadata tables (all tables except _traits_ and _yields_). It will also allow you to access all of the simulated data in the https://terraref.ncsa.illinois.edu/bety-test database. |
| 30 | + |
| 31 | +A common way of handling private API keys is to place it in a text file in your current directory. |
| 32 | +Don't put it in a project directory where it might be inadvertently shared. |
| 33 | + |
| 34 | +Here is how to find and save your API key: |
| 35 | + |
| 36 | +* click file --> new --> text file |
| 37 | +* copy the api key that was sent when you registered into the file |
| 38 | +* file --> save as '.betykey' |
| 39 | + |
| 40 | +For the public key, you can call this file `.betykey_public`. |
| 41 | + |
| 42 | + |
| 43 | +## Accessing data using a URL query |
| 44 | + |
| 45 | + |
| 46 | +### Components of a URL query |
| 47 | + |
| 48 | +* base url: `terraref.ncsa.illinois.edu/bety` |
| 49 | +* path to the api: `/api/v1` |
| 50 | +* api endpoint: `/search` or `traits` or `sites`. For BETYdb, these are the names of database tables. |
| 51 | +* Query parameters: `genus=Sorghum` |
| 52 | +* Authentication: `key=9999999999999999999999999999999999999999` is the public key for the TERRA REF traits database. |
| 53 | + |
| 54 | +### Constructing a URL query |
| 55 | + |
| 56 | +First, lets construct a query by putting together a URL. |
| 57 | + |
| 58 | +1. start with the database url: `terraref.ncsa.illinois.edu/bety` |
| 59 | + * this url brings you to the home page |
| 60 | +2. Add the path to the API, `/api/v1` |
| 61 | + * now we have terraref.ncsa.illinois.edu/bety/api/v1, which points to the API documentation |
| 62 | +3. Add the name of the table you want to query. Lets start with `variables` |
| 63 | + * terraref.ncsa.illinois.edu/bety/api/v1/variables |
| 64 | +4. add query terms by appending a `?` and combining with `&`, for example: |
| 65 | + * `key=9999999999999999999999999999999999999999` |
| 66 | + * `type=trait` where the variable type is 'trait' |
| 67 | + * `name=~height` where the variable name contains 'height' |
| 68 | +5. This is your complete query: |
| 69 | + * `terraref.ncsa.illinois.edu/bety/api/v1/variables?type=trait&name=~height&key=9999999999999999999999999999999999999999` |
| 70 | + * it will query all variables that are type trait and have 'height' in the name |
| 71 | + * Does it return the expected values? |
| 72 | + |
| 73 | +## Your Turn |
| 74 | + |
| 75 | +> What will the URL https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum&key=9999999999999999999999999999999999999999 return? |
| 76 | +
|
| 77 | +> Write a URL that will query the database for sites with "Field Scanner" in the name field. Hint: combine two terms with a `+` as in `Field+Scanner` |
| 78 | +
|
| 79 | +What do you see? Do you think that this is all of the records? What happens if you add `&limit=none`? |
| 80 | + |
| 81 | +## Accessing data using the Command Line Terminal |
| 82 | + |
| 83 | +Type the following command into a bash shell (the `-o` option names the output file): |
| 84 | + |
| 85 | +```sh |
| 86 | +curl -o sorghum.json \ |
| 87 | + "https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum&key=9999999999999999999999999999999999999999" |
| 88 | +``` |
| 89 | + |
| 90 | +If you want to write the query without exposing the key in plain text, you can construct it like this: |
| 91 | + |
| 92 | +```sh |
| 93 | +curl -o sorghum.json \ |
| 94 | + "https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum&key=`cat .betykey_public`" |
| 95 | +``` |
| 96 | + |
| 97 | +## Using the R jsonlite package to access the API with a URL query |
| 98 | + |
| 99 | +```{r text-api, warning = FALSE} |
| 100 | +sorghum.json <- readLines( |
| 101 | + paste0("https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum&key=", |
| 102 | + readLines('traits/.betykey'))) |
| 103 | +
|
| 104 | +## print(sorghum.json) |
| 105 | +## not a particularly useful format |
| 106 | +## lets convert to a data frame |
| 107 | +sorghum <- jsonlite::fromJSON(sorghum.json) |
| 108 | +``` |
| 109 | + |
| 110 | +More on how to use the rOpenSci traits package coming up in the [next tutorial](03-access-r-traits.Rmd) |
0 commit comments