11# Generating file lists by plot
22
3- The terrautils python package has a new products module that aid in connecting
4- plot boundaries stored within betydb with file-based data products available
3+ ** Note:** you may need to get access permissions to perform the steps
4+ described in this tutorial.
5+
6+ The terrautils python package has a new ` products ` module that aids in connecting
7+ plot boundaries stored within betydb with the file-based data products available
58from the workbench or Globus.
69
710## Getting started
811
9- After installing terrautils, you should be able to import the * product * module.
12+ After installing terrautils, you should be able to import the ` products ` module.
1013``` {python}
1114from terrautils.products import get_sensor_list, unique_sensor_names
1215from terrautils.products import get_file_listing, extract_file_paths
1316```
1417
15- The ` get_sensor_list ` and ` get_file_listing ` functions both require connection, url ,
16- and key parameters. * Connection * can be 'None', the * url* (called host in the
17- code) should be something like https://terraref.ncsa.illinois.edu/clowder/ .
18+ The ` get_sensor_list ` and ` get_file_listing ` functions both require the * connection * ,
19+ * url * , and * key* parameters. The * connection * can be 'None'. The * url* (called host in the
20+ code) should be something like ` https://terraref.ncsa.illinois.edu/clowder/ ` .
1821The * key* is a unique access key for the Clowder api.
1922
2023## Getting the sensor list
@@ -32,23 +35,12 @@ names = unique_sensor_names(sensors)
3235```
3336
3437Names will now contain a list of sensor names available in the Clowder
35- geostreams API. The currently available sensors are:
38+ geostreams API. The list of returned sensor names could be something like the
39+ following:
3640
3741* IR Surface Temperature
3842* Thermal IR GeoTIFFs Datasets
3943* flirIrCamera Datasets
40- * (EL) sensor_weather_station
41- * Irrigation Observations
42- * Canopy Cover
43- * Energy Farm Observations SE
44- * (EL) sensor_par
45- * scanner3DTop Datasets
46- * Weather Observations
47- * Energy Farm Observations NE
48- * RGB GeoTIFFs Datasets
49- * (EL) sensor_co2
50- * stereoTop Datasets
51- * Energy Farm Observations CEN
5244
5345## Getting a list of files
5446
@@ -72,41 +64,72 @@ dataset = get_file_listing(None, url, key, sensor, sitename,
7264```
7365
7466
75- # Alternative method
67+ # Querying the API
68+
69+ The source files behind the data are available for downloading through the API. By executing a series
70+ of requests against the API it's possible to determine the files of interest and then download them.
71+
72+ Each of the API URL's have the same beginning (https://terraref.ncsa.illinois.edu/clowder/api ),
73+ followed by the data needed for a specific request. As we step through the process you will be able
74+ to see how then end of the URL changes depending upon the reuqest.
75+
76+ Below is what the API looks like as a URL. Try pasting it into your browser.
77+
78+ https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name=MAC Field Scanner Season 1 Field Plot 101 W
79+
80+ This will return data for the requested plot including its id. This id (or identifier) can then be used for
81+ additional queries against the API.
82+
83+ In the examples below we will be using ** curl** on the command line to make our API calls. Since the
84+ API is accessed through URLs, it's possible to use the URLs in software programs or with a programming language
85+ to retrieve its data.
7686
77- The following method demonstrates the same approach using the Clowder API. This
78- approach is useful for understanding the data layout and when the Python
79- terrautils package is not available.
87+ ## A Word of Caution
88+
89+ The names of variables in this section don't necessarily match the ones in the section
90+ above. This is unintentinoal and is due to the legacy behind each of these approaches. The
91+ names are meaningful and consistent in their respective domains, but not between each other.
92+
93+ For example, the Clowder API's use of the term * SENSOR_NAME* is equivalent to * site_name* above.
8094
8195## Finding plot ID
8296
83- ```
84- SENSOR_NAME = "MAC Field Scanner Season 1 Field Plot 101 W"
85- GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name={SENSOR_NAME}
97+ We can query the API to find the identifier associated with the name of a plot. For this example
98+ we use the variable name of SENSOR_DATA to indicate the name of the plot.
99+
100+ ``` {sh eval=FALSE}
101+ SENSOR_NAME="MAC Field Scanner Season 1 Field Plot 101 W"
102+ curl -o plot.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name=${SENSOR_NAME}"
86103```
87104
88- This returns a JSON object with an 'id' parameter. You can use this ID parameter to specify the right data stream.
105+ This creates a file named * plot.json* containing the JSON object returned by the API. The JSON object has an
106+ 'id' parameter. This ID parameter can be used to specify the correct data stream.
89107
90108## Finding stream ID within a plot
91109
92- The names are formatted as "<Sensor Group > Datasets (<Sensor ID >)".
110+ Using the sensor ID returned in the JSON from the previous call and the id of a sensor returned previously to get
111+ the stream id. The names of streams are are formatted as "<Sensor Group > Datasets (<Sensor ID >)".
93112
94- ```
95- SENSOR_ID = 3355
96- STREAM_NAME = "Thermal IR GeoTIFFs Datasets ({SENSOR_ID})"
97- GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/streams?stream_name={STREAM_NAME}
113+ ``` {sh eval=FALSE}
114+ SENSOR_ID= 3355
115+ STREAM_NAME= "Thermal IR GeoTIFFs Datasets ($ {SENSOR_ID})"
116+ curl -o stream.json -X GET " https://terraref.ncsa.illinois.edu/clowder/api/geostreams/streams?stream_name=$ {STREAM_NAME}"
98117```
99118
100- This returns a JSON object with an 'id' parameter. You can use this ID parameter to get the right datapoints.
119+ A file named * stream.json* will be created containing the returned JSON object. This JSON object has an 'id' parameter that
120+ contains the stream ID. You can use this ID parameter to get the datasets, and then datapoints, of interest.
101121
102- ## Listing Clowder file IDs for that plot & sensor stream
122+ ## Listing Clowder dataset IDs for that plot & sensor stream
103123
104- ```
105- STREAM_ID = "11586"
106- GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id={STREAM_ID}
124+ We now have a stream ID that we can use to list our datasets. The datasets in turn contain files of interest.
125+
126+ ``` {sh eval=FALSE}
127+ STREAM_ID=11586
128+ curl -o datasets.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id=${STREAM_ID}"
107129```
108130
109- This returns a list of datapoint JSON objects, each with a 'properties' parameter that looks like:
131+ After the call succeeds, a file named * datasets.json* is created containing the returned JSON onject. As part of the
132+ JSON object there are one or more ` properties ` fields containing * source_dataset* parameters.
110133
111134``` {python}
112135properties: {
@@ -115,27 +138,29 @@ properties: {
115138},
116139```
117140
118- The source_dataset URL can be used to view the dataset in Clowder.
141+ The URL of each ** source_dataset ** can be used to view the dataset in Clowder.
119142
120- You can also filter the datapoints by date:
143+ The datasets can also be filtered by date. The following filters out datasets that are outside of the range of January 2, 2017 through June 20, 2017.
121144
122- ```
123- GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id={STREAM_ID}&since=2017-01-02&until=2017-06-10
145+ ``` {sh eval=FALSE}
146+ curl -o datasets.json -X GET " https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id=$ {STREAM_ID}&since=2017-01-02&until=2017-06-10"
124147```
125148
126- ## Getting ROGER file path from dataset
149+ ## Getting file paths from dataset
127150
128- Given a source dataset URL , we can call the API to get the files and their paths.
151+ Now that we know what the dataset URLs are , we can use the URLs to query the API for file IDs in addition to their names and paths.
129152
130- ```
131- SOURCE_DATASET = "https://terraref.ncsa.illinois.edu/clowder/datasets/59fc9e7d4f0c3383c73d2905"
132- # Add /api after /clowder, and add /files at the end of the URL
133- GET "https://terraref.ncsa.illinois.edu/clowder/api/datasets/59fc9e7d4f0c3383c73d2905/files"
153+ Note the the URL has changed from our previous examples now that we're using the URLs returned by the previous call.
154+
155+ ``` {sh eval=FALSE}
156+ SOURCE_DATASET="https://terraref.ncsa.illinois.edu/clowder/datasets/59fc9e7d4f0c3383c73d2905"
157+ curl -o files.json -X GET "${SOURCE_DATASET}/files"
134158```
135159
136- This returns a list of files in the dataset and their paths if available:
160+ As before, we will have a file containing the returned JSON, named * files.json* in this case. The returned JSON consists of a list
161+ of the files in the dataset with their IDs, and other data if available:
137162
138- ```
163+ ``` {python}
139164[
140165 {
141166 size: "346069",
@@ -156,4 +181,19 @@ This returns a list of files in the dataset and their paths if available:
156181]
157182```
158183
159- Depending on permissions you may need to provide authentication to get this list.
184+ ## Retrieving the files
185+
186+ Given that a large number of files may be contained in a dataset, it may be desireable to automate the process of pulling down files
187+ to the local system.
188+
189+ For each file to be retrieved, the unique file ID is needed on the URL.
190+
191+ ``` {sh eval=FALSE}
192+ FILE_NAME="ir_geotiff_L1_ua-mac_2016-05-09__12-07-57-990.tif"
193+ FILE_ID=59fc9e844f0c3383c73d2980
194+ curl -o "${FILE_NAME}" -X GET "https://terraref.ncsa.illinois.edu/clowder/api/files/${FILE_ID}"
195+ ```
196+
197+ This call will cause the server to return the contents of the file identified in the URL. This file is then stored locally in * ir_geotiff_L1_ua-mac_2016-05-09__ 12-07-57-990.tif* .
198+
199+
0 commit comments