Skip to content

Commit 0382761

Browse files
Merge pull request #95 from terraref/image_tutorial
Image tutorial
2 parents 4155a8d + c0b8f82 commit 0382761

1 file changed

Lines changed: 123 additions & 55 deletions

File tree

Lines changed: 123 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,30 @@
11
# Generating file lists by plot
22

3-
The terrautils python package has a new products module that aid in connecting
4-
plot boundaries stored within betydb with file-based data products available
3+
## Pre-requisites:
4+
5+
* if you have not already done so, you will need to 1) sign up for the [beta user program](terraref.org/beta) and 2)
6+
sign up and be approved for access to the the [sensor data portal](terraref.ncsa.illinois.edu/clowder) in order to get
7+
the API key that will be used in this tutorial.
8+
9+
The terrautils python package has a new `products` module that aids in connecting
10+
plot boundaries stored within betydb with the file-based data products available
511
from the workbench or Globus.
612

13+
* if are using Rstudio and want to run the Python code chunks, the R package "reticulate" is required
14+
* use `pip3 install terrautils` to install the terrautils Python library
15+
716
## Getting started
817

9-
After installing terrautils, you should be able to import the *product* module.
18+
After installing terrautils, you should be able to import the `products` module.
1019
```{python}
1120
from terrautils.products import get_sensor_list, unique_sensor_names
1221
from terrautils.products import get_file_listing, extract_file_paths
1322
```
1423

15-
The `get_sensor_list` and `get_file_listing` functions both require connection, url,
16-
and key parameters. *Connection* can be 'None', the *url* (called host in the
17-
code) should be something like https://terraref.ncsa.illinois.edu/clowder/.
18-
The *key* is a unique access key for the Clowder api.
24+
The `get_sensor_list` and `get_file_listing` functions both require the *connection*,
25+
*url*, and *key* parameters. The *connection* can be 'None'. The *url* (called host in the
26+
code) should be something like `https://terraref.ncsa.illinois.edu/clowder/`.
27+
The *key* is a unique access key for the Clowder API.
1928

2029
## Getting the sensor list
2130

@@ -26,116 +35,160 @@ a plot id number. The utility function `unique_sensor_names` accpets the
2635
sensor list and provides a list of names suitable for use in the
2736
`get_file_listing` function.
2837

38+
To use this tutorial you will need to sign up for Clowder, have your
39+
account approved, and then get an API key from the [Clowder web interface](https://terraref.ncsa.illinois.edu/clowder).
40+
41+
```{python}
42+
url = 'https://terraref.ncsa.illinois.edu/clowder/'
43+
key = 'ENTER YOUR KEY HERE'
44+
```
45+
2946
```{python}
3047
sensors = get_sensor_list(None, url, key)
3148
names = unique_sensor_names(sensors)
49+
print(names)
3250
```
3351

52+
3453
Names will now contain a list of sensor names available in the Clowder
35-
geostreams API. The currently available sensors are:
54+
geostreams API. The list of returned sensor names could be something like the
55+
following:
3656

37-
* IR Surface Temperature
38-
* Thermal IR GeoTIFFs Datasets
3957
* flirIrCamera Datasets
40-
* (EL) sensor_weather_station
41-
* Irrigation Observations
42-
* Canopy Cover
43-
* Energy Farm Observations SE
44-
* (EL) sensor_par
45-
* scanner3DTop Datasets
46-
* Weather Observations
47-
* Energy Farm Observations NE
58+
* IR Surface Temperature
4859
* RGB GeoTIFFs Datasets
49-
* (EL) sensor_co2
5060
* stereoTop Datasets
51-
* Energy Farm Observations CEN
61+
* scanner3DTop Datasets
62+
* Thermal IR GeoTIFFs Datasets
63+
* ...
5264

5365
## Getting a list of files
5466

5567
The geostreams API can be used to get a list of datasets that overlap a
5668
specific plot boundary and, optionally, limited by a time range. Iterating
5769
over the datasets allows the paths to all the files to be extracted.
5870

59-
```{python}
71+
```{python eval = FALSE}
6072
sensor = 'Thermal IR GeoTIFFs Datasets'
6173
sitename = 'MAC Field Scanner Season 1 Field Plot 101 W'
74+
key = 'INSERT YOUR KEY HERE'
6275
datasets = get_file_listing(None, url, key, sensor, sitename)
6376
files = extract_file_paths(datasets)
6477
```
6578

6679
Datasets can be further filtered using the *since* and *until* parameters
6780
of `get_file_listing` with a date string.
6881

69-
```{python}
82+
```{python eval=FALSE}
7083
dataset = get_file_listing(None, url, key, sensor, sitename,
7184
since='2016-06-01', until='2016-06-10')
7285
```
7386

7487

75-
# Alternative method
88+
# Querying the API
89+
90+
<!--
91+
TODO: move this to a separate tutorial page focused on using curl
92+
-->
93+
94+
The source files behind the data are available for downloading through the API. By executing a series
95+
of requests against the API it's possible to determine the files of interest and then download them.
96+
97+
Each of the API URL's have the same beginning (https://terraref.ncsa.illinois.edu/clowder/api),
98+
followed by the data needed for a specific request. As we step through the process you will be able
99+
to see how then end of the URL changes depending upon the reuqest.
100+
101+
Below is what the API looks like as a URL. Try pasting it into your browser.
102+
103+
[https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name=MAC Field Scanner Season 1 Field Plot 101 W](https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name=MAC Field Scanner Season 1 Field Plot 101 W)
104+
105+
This will return data for the requested plot including its id. This id (or identifier) can then be used for
106+
additional queries against the API.
76107

77-
The following method demonstrates the same approach using the Clowder API. This
78-
approach is useful for understanding the data layout and when the Python
79-
terrautils package is not available.
108+
In the examples below we will be using **curl** on the command line to make our API calls. Since the
109+
API is accessed through URLs, it's possible to use the URLs in software programs or with a programming language
110+
to retrieve its data.
111+
112+
## A Word of Caution
113+
114+
We are no longer using the python terrautils package, which is a python library that provides helper functions that simplify interactions with the Clowder API. One of the ways it makes the interface easier is by using function names that make sense in the scope of the project. The API and the Clowder database have different names and _this is confusing_ since the same names are used for different parts of the database.
115+
116+
The names and meanings of variables in this section don't necessarily match the ones in the section
117+
above and it may be easy to get them confused. The API queries the database directly and thereby reflects
118+
the database structure. This is the main reason for the naming differences between the API and the terraref
119+
client.
120+
121+
For example, the Clowder API's use of the term *SENSOR_NAME* is equivalent to *site_name* above.
80122

81123
## Finding plot ID
82124

83-
```
84-
SENSOR_NAME = "MAC Field Scanner Season 1 Field Plot 101 W"
85-
GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name={SENSOR_NAME}
125+
We can query the API to find the identifier associated with the name of a plot. For this example
126+
we use the variable name of SENSOR_DATA to indicate the name of the plot.
127+
128+
``` {sh eval=FALSE}
129+
SENSOR_NAME="MAC Field Scanner Season 1 Field Plot 101 W"
130+
curl -o plot.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name=${SENSOR_NAME}"
86131
```
87132

88-
This returns a JSON object with an 'id' parameter. You can use this ID parameter to specify the right data stream.
133+
This creates a file named *plot.json* containing the JSON object returned by the API. The JSON object has an
134+
'id' parameter. This ID parameter can be used to specify the correct data stream.
89135

90136
## Finding stream ID within a plot
91137

92-
The names are formatted as "<Sensor Group> Datasets (<Sensor ID>)".
138+
Using the sensor ID returned in the JSON from the previous call and the id of a sensor returned previously to get
139+
the stream id. The names of streams are are formatted as "<Sensor Group> Datasets (<Sensor ID>)".
93140

94-
```
95-
SENSOR_ID = 3355
96-
STREAM_NAME = "Thermal IR GeoTIFFs Datasets ({SENSOR_ID})"
97-
GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/streams?stream_name={STREAM_NAME}
141+
``` {sh eval=FALSE}
142+
SENSOR_ID=3355
143+
STREAM_NAME="Thermal IR GeoTIFFs Datasets (${SENSOR_ID})"
144+
curl -o stream.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/streams?stream_name=${STREAM_NAME}"
98145
```
99146

100-
This returns a JSON object with an 'id' parameter. You can use this ID parameter to get the right datapoints.
147+
A file named *stream.json* will be created containing the returned JSON object. This JSON object has an 'id' parameter that
148+
contains the stream ID. You can use this ID parameter to get the datasets, and then datapoints, of interest.
101149

102-
## Listing Clowder file IDs for that plot & sensor stream
150+
## Listing Clowder dataset IDs for that plot & sensor stream
103151

104-
```
105-
STREAM_ID = "11586"
106-
GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id={STREAM_ID}
152+
We now have a stream ID that we can use to list our datasets. The datasets in turn contain files of interest.
153+
154+
``` {sh eval=FALSE}
155+
STREAM_ID=11586
156+
curl -o datasets.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id=${STREAM_ID}"
107157
```
108158

109-
This returns a list of datapoint JSON objects, each with a 'properties' parameter that looks like:
159+
After the call succeeds, a file named *datasets.json* is created containing the returned JSON onject. As part of the
160+
JSON object there are one or more `properties` fields containing *source_dataset* parameters.
110161

111-
```{python}
162+
```{javascript eval=FALSE}
112163
properties: {
113164
dataset_name: "Thermal IR GeoTIFFs - 2016-05-09__12-07-57-990",
114165
source_dataset: "https://terraref.ncsa.illinois.edu/clowder/datasets/59fc9e7d4f0c3383c73d2905"
115166
},
116167
```
117168

118-
The source_dataset URL can be used to view the dataset in Clowder.
169+
The URL of each **source_dataset** can be used to view the dataset in Clowder.
119170

120-
You can also filter the datapoints by date:
171+
The datasets can also be filtered by date. The following filters out datasets that are outside of the range of January 2, 2017 through June 20, 2017.
121172

122-
```
123-
GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id={STREAM_ID}&since=2017-01-02&until=2017-06-10
173+
``` {sh eval=FALSE}
174+
curl -o datasets.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id=${STREAM_ID}&since=2017-01-02&until=2017-06-10"
124175
```
125176

126-
## Getting ROGER file path from dataset
177+
## Getting file paths from dataset
127178

128-
Given a source dataset URL, we can call the API to get the files and their paths.
179+
Now that we know what the dataset URLs are, we can use the URLs to query the API for file IDs in addition to their names and paths.
129180

130-
```
131-
SOURCE_DATASET = "https://terraref.ncsa.illinois.edu/clowder/datasets/59fc9e7d4f0c3383c73d2905"
132-
# Add /api after /clowder, and add /files at the end of the URL
133-
GET "https://terraref.ncsa.illinois.edu/clowder/api/datasets/59fc9e7d4f0c3383c73d2905/files"
181+
Note the the URL has changed from our previous examples now that we're using the URLs returned by the previous call.
182+
183+
``` {sh eval=FALSE}
184+
SOURCE_DATASET="https://terraref.ncsa.illinois.edu/clowder/datasets/59fc9e7d4f0c3383c73d2905"
185+
curl -o files.json -X GET "${SOURCE_DATASET}/files"
134186
```
135187

136-
This returns a list of files in the dataset and their paths if available:
188+
As before, we will have a file containing the returned JSON, named *files.json* in this case. The returned JSON consists of a list
189+
of the files in the dataset with their IDs, and other data if available:
137190

138-
```
191+
``` {javascript eval=FALSE}
139192
[
140193
{
141194
size: "346069",
@@ -156,4 +209,19 @@ This returns a list of files in the dataset and their paths if available:
156209
]
157210
```
158211

159-
Depending on permissions you may need to provide authentication to get this list.
212+
## Retrieving the files
213+
214+
Given that a large number of files may be contained in a dataset, it may be desireable to automate the process of pulling down files
215+
to the local system.
216+
217+
For each file to be retrieved, the unique file ID is needed on the URL.
218+
219+
``` {sh eval=FALSE}
220+
FILE_NAME="ir_geotiff_L1_ua-mac_2016-05-09__12-07-57-990.tif"
221+
FILE_ID=59fc9e844f0c3383c73d2980
222+
curl -o "${FILE_NAME}" -X GET "https://terraref.ncsa.illinois.edu/clowder/api/files/${FILE_ID}"
223+
```
224+
225+
This call will cause the server to return the contents of the file identified in the URL. This file is then stored locally in *ir_geotiff_L1_ua-mac_2016-05-09__12-07-57-990.tif*.
226+
227+

0 commit comments

Comments
 (0)