Made changes and updates per tutorial review

Chris-Schnaufer · Chris-Schnaufer · commit 2e10a312315c · 2019-01-23T16:08:16.000-07:00
diff --git a/sensors/06-list-datasets-by-plot.Rmd b/sensors/06-list-datasets-by-plot.Rmd
@@ -1,20 +1,23 @@
 # Generating file lists by plot
 
-The terrautils python package has a new products module that aid in connecting
-plot boundaries stored within betydb with file-based data products available
+**Note:** you may need to get access permissions to perform the steps 
+described in this tutorial. 
+
+The terrautils python package has a new `products` module that aids in connecting
+plot boundaries stored within betydb with the file-based data products available
 from the workbench or Globus.
 
 ## Getting started
 
-After installing terrautils, you should be able to import the *product* module.
+After installing terrautils, you should be able to import the `products` module.
 ```{python}
 from terrautils.products import get_sensor_list, unique_sensor_names
 from terrautils.products import get_file_listing, extract_file_paths
 ```
 
-The `get_sensor_list` and `get_file_listing` functions both require connection, url,
-and key parameters. *Connection* can be 'None', the *url* (called host in the
-code) should be something like https://terraref.ncsa.illinois.edu/clowder/.
+The `get_sensor_list` and `get_file_listing` functions both require the *connection*,
+*url*, and *key* parameters. The *connection* can be 'None'. The *url* (called host in the
+code) should be something like `https://terraref.ncsa.illinois.edu/clowder/`.
 The *key* is a unique access key for the Clowder api.
 
 ## Getting the sensor list
@@ -32,23 +35,12 @@ names = unique_sensor_names(sensors)
 ```
 
 Names will now contain a list of sensor names available in the Clowder
-geostreams API. The currently available sensors are:
+geostreams API. The list of returned sensor names could be something like the 
+following:
 
 * IR Surface Temperature
 * Thermal IR GeoTIFFs Datasets
 * flirIrCamera Datasets
-* (EL) sensor_weather_station
-* Irrigation Observations
-* Canopy Cover
-* Energy Farm Observations SE
-* (EL) sensor_par
-* scanner3DTop Datasets
-* Weather Observations
-* Energy Farm Observations NE
-* RGB GeoTIFFs Datasets
-* (EL) sensor_co2
-* stereoTop Datasets
-* Energy Farm Observations CEN
 
 ## Getting a list of files
 
@@ -72,41 +64,72 @@ dataset = get_file_listing(None, url, key, sensor, sitename,
 ```
 
 
-# Alternative method
+# Querying the API
+
+The source files behind the data are available for downloading through the API. By executing a series
+of requests against the API it's possible to determine the files of interest and then download them.
+
+Each of the API URL's have the same beginning (https://terraref.ncsa.illinois.edu/clowder/api), 
+followed by the data needed for a specific request. As we step through the process you will be able
+to see how then end of the URL changes depending upon the reuqest.
+
+Below is what the API looks like as a URL. Try pasting it into your browser.
+
+https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name=MAC Field Scanner Season 1 Field Plot 101 W
+
+This will return data for the requested plot including its id. This id (or identifier) can then be used for 
+additional queries against the API.
+
+In the examples below we will be using **curl** on the command line to make our API calls. Since the
+API is accessed through URLs, it's possible to use the URLs in software programs or with a programming language
+to retrieve its data. 
 
-The following method demonstrates the same approach using the Clowder API. This
-approach is useful for understanding the data layout and when the Python
-terrautils package is not available.
+## A Word of Caution
+
+The names of variables in this section don't necessarily match the ones in the section 
+above. This is unintentinoal and is due to the legacy behind each of these approaches. The
+names are meaningful and consistent in their respective domains, but not between each other.
+
+For example, the Clowder API's use of the term *SENSOR_NAME* is equivalent to *site_name* above.
 
 ## Finding plot ID
 
-```
-SENSOR_NAME = "MAC Field Scanner Season 1 Field Plot 101 W"
-GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name={SENSOR_NAME}
+We can query the API to find the identifier associated with the name of a plot. For this example
+we use the variable name of SENSOR_DATA to indicate the name of the plot.
+
+``` {sh eval=FALSE}
+SENSOR_NAME="MAC Field Scanner Season 1 Field Plot 101 W"
+curl -o plot.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name=${SENSOR_NAME}"
 ```
 
-This returns a JSON object with an 'id' parameter. You can use this ID parameter to specify the right data stream.
+This creates a file named *plot.json* containing the JSON object returned by the API. The JSON object has an 
+'id' parameter. This ID parameter can be used to specify the correct data stream.
 
 ## Finding stream ID within a plot
 
-The names are formatted as "<Sensor Group> Datasets (<Sensor ID>)".
+Using the sensor ID returned in the JSON from the previous call and the id of a sensor returned previously to get
+the stream id. The names of streams are are formatted as "<Sensor Group> Datasets (<Sensor ID>)".
 
-```
-SENSOR_ID = 3355
-STREAM_NAME = "Thermal IR GeoTIFFs Datasets ({SENSOR_ID})"
-GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/streams?stream_name={STREAM_NAME}
+``` {sh eval=FALSE}
+SENSOR_ID=3355
+STREAM_NAME="Thermal IR GeoTIFFs Datasets (${SENSOR_ID})"
+curl -o stream.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/streams?stream_name=${STREAM_NAME}"
 ```
 
-This returns a JSON object with an 'id' parameter. You can use this ID parameter to get the right datapoints.
+A file named *stream.json* will be created containing the returned JSON object. This JSON object has an 'id' parameter that
+contains the stream ID. You can use this ID parameter to get the datasets, and then datapoints, of interest.
 
-## Listing Clowder file IDs for that plot & sensor stream
+## Listing Clowder dataset IDs for that plot & sensor stream
 
-```
-STREAM_ID = "11586"
-GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id={STREAM_ID}
+We now have a stream ID that we can use to list our datasets. The datasets in turn contain files of interest.
+
+``` {sh eval=FALSE}
+STREAM_ID=11586
+curl -o datasets.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id=${STREAM_ID}"
 ```
 
-This returns a list of datapoint JSON objects, each with a 'properties' parameter that looks like:
+After the call succeeds, a file named *datasets.json* is created containing the returned JSON onject. As part of the
+JSON object there are one or more `properties` fields containing *source_dataset* parameters.
 
 ```{python}
 properties: {
@@ -115,27 +138,29 @@ properties: {
 },
 ```
 
-The source_dataset URL can be used to view the dataset in Clowder.
+The URL of each **source_dataset** can be used to view the dataset in Clowder.
 
-You can also filter the datapoints by date:
+The datasets can also be filtered by date. The following filters out datasets that are outside of the range of January 2, 2017 through June 20, 2017.
 
-```
-GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id={STREAM_ID}&since=2017-01-02&until=2017-06-10
+``` {sh eval=FALSE}
+curl -o datasets.json -X GET "https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id=${STREAM_ID}&since=2017-01-02&until=2017-06-10"
 ```
 
-## Getting ROGER file path from dataset
+## Getting file paths from dataset
 
-Given a source dataset URL, we can call the API to get the files and their paths.
+Now that we know what the dataset URLs are, we can use the URLs to query the API for file IDs in addition to their names and paths.
 
-```
-SOURCE_DATASET = "https://terraref.ncsa.illinois.edu/clowder/datasets/59fc9e7d4f0c3383c73d2905"
-# Add /api after /clowder, and add /files at the end of the URL
-GET "https://terraref.ncsa.illinois.edu/clowder/api/datasets/59fc9e7d4f0c3383c73d2905/files"
+Note the the URL has changed from our previous examples now that we're using the URLs returned by the previous call.
+
+``` {sh eval=FALSE}
+SOURCE_DATASET="https://terraref.ncsa.illinois.edu/clowder/datasets/59fc9e7d4f0c3383c73d2905"
+curl -o files.json -X GET "${SOURCE_DATASET}/files"
 ```
 
-This returns a list of files in the dataset and their paths if available:
+As before, we will have a file containing the returned JSON, named *files.json* in this case. The returned JSON consists of a list 
+of the files in the dataset with their IDs, and other data if available:
 
-```
+``` {python}
 [
     {
         size: "346069",
@@ -156,4 +181,19 @@ This returns a list of files in the dataset and their paths if available:
 ]
 ```
 
-Depending on permissions you may need to provide authentication to get this list.
+## Retrieving the files
+
+Given that a large number of files may be contained in a dataset, it may be desireable to automate the process of pulling down files
+to the local system.
+
+For each file to be retrieved, the unique file ID is needed on the URL.
+
+``` {sh eval=FALSE}
+FILE_NAME="ir_geotiff_L1_ua-mac_2016-05-09__12-07-57-990.tif"
+FILE_ID=59fc9e844f0c3383c73d2980
+curl -o "${FILE_NAME}" -X GET "https://terraref.ncsa.illinois.edu/clowder/api/files/${FILE_ID}"
+```
+
+This call will cause the server to return the contents of the file identified in the URL. This file is then stored locally in *ir_geotiff_L1_ua-mac_2016-05-09__12-07-57-990.tif*.
+
+