Skip to content

Commit 2e10a31

Browse files
Made changes and updates per tutorial review
1 parent bbb79eb commit 2e10a31

1 file changed

Lines changed: 91 additions & 51 deletions

File tree

Lines changed: 91 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,23 @@
11
# Generating file lists by plot
22

3-
The terrautils python package has a new products module that aid in connecting
4-
plot boundaries stored within betydb with file-based data products available
3+
**Note:** you may need to get access permissions to perform the steps
4+
described in this tutorial.
5+
6+
The terrautils python package has a new `products` module that aids in connecting
7+
plot boundaries stored within betydb with the file-based data products available
58
from the workbench or Globus.
69

710
## Getting started
811

9-
After installing terrautils, you should be able to import the *product* module.
12+
After installing terrautils, you should be able to import the `products` module.
1013
```{python}
1114
from terrautils.products import get_sensor_list, unique_sensor_names
1215
from terrautils.products import get_file_listing, extract_file_paths
1316
```
1417

15-
The `get_sensor_list` and `get_file_listing` functions both require connection, url,
16-
and key parameters. *Connection* can be 'None', the *url* (called host in the
17-
code) should be something like https://terraref.ncsa.illinois.edu/clowder/.
18+
The `get_sensor_list` and `get_file_listing` functions both require the *connection*,
19+
*url*, and *key* parameters. The *connection* can be 'None'. The *url* (called host in the
20+
code) should be something like `https://terraref.ncsa.illinois.edu/clowder/`.
1821
The *key* is a unique access key for the Clowder api.
1922

2023
## Getting the sensor list
@@ -32,23 +35,12 @@ names = unique_sensor_names(sensors)
3235
```
3336

3437
Names will now contain a list of sensor names available in the Clowder
35-
geostreams API. The currently available sensors are:
38+
geostreams API. The list of returned sensor names could be something like the
39+
following:
3640

3741
* IR Surface Temperature
3842
* Thermal IR GeoTIFFs Datasets
3943
* flirIrCamera Datasets
40-
* (EL) sensor_weather_station
41-
* Irrigation Observations
42-
* Canopy Cover
43-
* Energy Farm Observations SE
44-
* (EL) sensor_par
45-
* scanner3DTop Datasets
46-
* Weather Observations
47-
* Energy Farm Observations NE
48-
* RGB GeoTIFFs Datasets
49-
* (EL) sensor_co2
50-
* stereoTop Datasets
51-
* Energy Farm Observations CEN
5244

5345
## Getting a list of files
5446

@@ -72,41 +64,72 @@ dataset = get_file_listing(None, url, key, sensor, sitename,
7264
```
7365

7466

75-
# Alternative method
67+
# Querying the API
68+
69+
The source files behind the data are available for downloading through the API. By executing a series
70+
of requests against the API it's possible to determine the files of interest and then download them.
71+
72+
Each of the API URL's have the same beginning (https://terraref.ncsa.illinois.edu/clowder/api),
73+
followed by the data needed for a specific request. As we step through the process you will be able
74+
to see how then end of the URL changes depending upon the reuqest.
75+
76+
Below is what the API looks like as a URL. Try pasting it into your browser.
77+
78+
https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name=MAC Field Scanner Season 1 Field Plot 101 W
79+
80+
This will return data for the requested plot including its id. This id (or identifier) can then be used for
81+
additional queries against the API.
82+
83+
In the examples below we will be using **curl** on the command line to make our API calls. Since the
84+
API is accessed through URLs, it's possible to use the URLs in software programs or with a programming language
85+
to retrieve its data.
7686

77-
The following method demonstrates the same approach using the Clowder API. This
78-
approach is useful for understanding the data layout and when the Python
79-
terrautils package is not available.
87+
## A Word of Caution
88+
89+
The names of variables in this section don't necessarily match the ones in the section
90+
above. This is unintentinoal and is due to the legacy behind each of these approaches. The
91+
names are meaningful and consistent in their respective domains, but not between each other.
92+
93+
For example, the Clowder API's use of the term *SENSOR_NAME* is equivalent to *site_name* above.
8094

8195
## Finding plot ID
8296

83-
```
84-
SENSOR_NAME = "MAC Field Scanner Season 1 Field Plot 101 W"
85-
GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name={SENSOR_NAME}
97+
We can query the API to find the identifier associated with the name of a plot. For this example
98+
we use the variable name of SENSOR_DATA to indicate the name of the plot.
99+
100+
``` {sh eval=FALSE}
101+
SENSOR_NAME="MAC Field Scanner Season 1 Field Plot 101 W"
102+
curl -o plot.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name=${SENSOR_NAME}"
86103
```
87104

88-
This returns a JSON object with an 'id' parameter. You can use this ID parameter to specify the right data stream.
105+
This creates a file named *plot.json* containing the JSON object returned by the API. The JSON object has an
106+
'id' parameter. This ID parameter can be used to specify the correct data stream.
89107

90108
## Finding stream ID within a plot
91109

92-
The names are formatted as "<Sensor Group> Datasets (<Sensor ID>)".
110+
Using the sensor ID returned in the JSON from the previous call and the id of a sensor returned previously to get
111+
the stream id. The names of streams are are formatted as "<Sensor Group> Datasets (<Sensor ID>)".
93112

94-
```
95-
SENSOR_ID = 3355
96-
STREAM_NAME = "Thermal IR GeoTIFFs Datasets ({SENSOR_ID})"
97-
GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/streams?stream_name={STREAM_NAME}
113+
``` {sh eval=FALSE}
114+
SENSOR_ID=3355
115+
STREAM_NAME="Thermal IR GeoTIFFs Datasets (${SENSOR_ID})"
116+
curl -o stream.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/streams?stream_name=${STREAM_NAME}"
98117
```
99118

100-
This returns a JSON object with an 'id' parameter. You can use this ID parameter to get the right datapoints.
119+
A file named *stream.json* will be created containing the returned JSON object. This JSON object has an 'id' parameter that
120+
contains the stream ID. You can use this ID parameter to get the datasets, and then datapoints, of interest.
101121

102-
## Listing Clowder file IDs for that plot & sensor stream
122+
## Listing Clowder dataset IDs for that plot & sensor stream
103123

104-
```
105-
STREAM_ID = "11586"
106-
GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id={STREAM_ID}
124+
We now have a stream ID that we can use to list our datasets. The datasets in turn contain files of interest.
125+
126+
``` {sh eval=FALSE}
127+
STREAM_ID=11586
128+
curl -o datasets.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id=${STREAM_ID}"
107129
```
108130

109-
This returns a list of datapoint JSON objects, each with a 'properties' parameter that looks like:
131+
After the call succeeds, a file named *datasets.json* is created containing the returned JSON onject. As part of the
132+
JSON object there are one or more `properties` fields containing *source_dataset* parameters.
110133

111134
```{python}
112135
properties: {
@@ -115,27 +138,29 @@ properties: {
115138
},
116139
```
117140

118-
The source_dataset URL can be used to view the dataset in Clowder.
141+
The URL of each **source_dataset** can be used to view the dataset in Clowder.
119142

120-
You can also filter the datapoints by date:
143+
The datasets can also be filtered by date. The following filters out datasets that are outside of the range of January 2, 2017 through June 20, 2017.
121144

122-
```
123-
GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id={STREAM_ID}&since=2017-01-02&until=2017-06-10
145+
``` {sh eval=FALSE}
146+
curl -o datasets.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id=${STREAM_ID}&since=2017-01-02&until=2017-06-10"
124147
```
125148

126-
## Getting ROGER file path from dataset
149+
## Getting file paths from dataset
127150

128-
Given a source dataset URL, we can call the API to get the files and their paths.
151+
Now that we know what the dataset URLs are, we can use the URLs to query the API for file IDs in addition to their names and paths.
129152

130-
```
131-
SOURCE_DATASET = "https://terraref.ncsa.illinois.edu/clowder/datasets/59fc9e7d4f0c3383c73d2905"
132-
# Add /api after /clowder, and add /files at the end of the URL
133-
GET "https://terraref.ncsa.illinois.edu/clowder/api/datasets/59fc9e7d4f0c3383c73d2905/files"
153+
Note the the URL has changed from our previous examples now that we're using the URLs returned by the previous call.
154+
155+
``` {sh eval=FALSE}
156+
SOURCE_DATASET="https://terraref.ncsa.illinois.edu/clowder/datasets/59fc9e7d4f0c3383c73d2905"
157+
curl -o files.json -X GET "${SOURCE_DATASET}/files"
134158
```
135159

136-
This returns a list of files in the dataset and their paths if available:
160+
As before, we will have a file containing the returned JSON, named *files.json* in this case. The returned JSON consists of a list
161+
of the files in the dataset with their IDs, and other data if available:
137162

138-
```
163+
``` {python}
139164
[
140165
{
141166
size: "346069",
@@ -156,4 +181,19 @@ This returns a list of files in the dataset and their paths if available:
156181
]
157182
```
158183

159-
Depending on permissions you may need to provide authentication to get this list.
184+
## Retrieving the files
185+
186+
Given that a large number of files may be contained in a dataset, it may be desireable to automate the process of pulling down files
187+
to the local system.
188+
189+
For each file to be retrieved, the unique file ID is needed on the URL.
190+
191+
``` {sh eval=FALSE}
192+
FILE_NAME="ir_geotiff_L1_ua-mac_2016-05-09__12-07-57-990.tif"
193+
FILE_ID=59fc9e844f0c3383c73d2980
194+
curl -o "${FILE_NAME}" -X GET "https://terraref.ncsa.illinois.edu/clowder/api/files/${FILE_ID}"
195+
```
196+
197+
This call will cause the server to return the contents of the file identified in the URL. This file is then stored locally in *ir_geotiff_L1_ua-mac_2016-05-09__12-07-57-990.tif*.
198+
199+

0 commit comments

Comments
 (0)