Merge pull request #96 from terraref/images_vignette

Chris-Schnaufer · web-flow · commit bcc4a5bf49ee · 2019-01-29T15:01:45.000-07:00
Images vignette
diff --git a/vignettes/03-get-images-python.Rmd b/vignettes/03-get-images-python.Rmd
@@ -1 +1,116 @@
-# Images Vignette
+---
+title: "Get Source Image Files"
+output: html_document
+---
+
+# Objective: To be able to demonstrate how to locate and retrieve RGB image files
+
+This vignette shows how to locate and retrieve image files associated with growing Season 6 
+from the University of Arizona's [Maricopa Agricultural Center](http://cals-mac.arizona.edu/) 
+using Python. The files are stored online on the data management system Clowder, 
+which is accessed using an API. We will be working with the image files generated during the 
+month of May by limiting the requests to that time period.
+
+After completing this vignette it should be possible to search for and retrieve other
+files through the use of the API.
+
+As an added bonus we've also included an exmple of how to retrieve the list of available 
+sensor names through the API. By using the sensor names returned, it's possible to retrieve
+other files containing the data the sensors have collected.
+
+**requirements** 
+* Python 3 
+* the terrautils library
+   * this can be installed from pypi by running `pip install terrautils` in the terminal
+* an API key to access these data
+
+The API key is a string that gets generated upon request through your Clowder account. Existing
+API keys will work with this vignette. To get a new API key it is necessary to first register 
+with Clowder at "https://terraref.ncsa.illinois.edu/clowder/". First click the `Login` button and 
+wait for the login screen to appear. Then select the `Sign up` button and enter an email
+address you have access to. An email is sent to the entered address with instructions for 
+completing the registration process. Once registration is complete, log 
+into Clowder and select the `View profile` menu option from the drop-down that is near the search 
+control. By clicking the `+ Add` button under "User API Keys" heading in the profile page, a new 
+key is gnerated.
+
+## Locating the images
+
+To begin looking for files, a sensor name and site name are needed. We will be using 
+'RGB GeoTIFFs Datasets' as the sensor name and '' as the site name. Later in this
+vignette we show how to retrieve the list of available sensors.
+
+As mentioned in the overview, the url string will point to the API to use. In this case
+we'll be using "https://terraref.ncsa.illinois.edu/clowder/api" and the key will be the
+one you created for your Clowder account.
+
+```{python eval=FALSE}
+from terrautils.products import get_file_listing
+
+url = 'https://terraref.ncsa.illinois.edu/clowder/api'
+key = 'YOUR_KEY_GOES_HERE'
+sensor = 'RGB GeoTIFFs Datasets'
+sitename = ''
+files = get_file_listing(None, url, key, sensor, sitename,
+                   since='2018-05-01', until='2018-05-31')
+```
+
+The `files` variable now contains an array of all the file in the datasets that match the
+sensor in the plot for the month of May. When performing you own queries it's possible that there
+are no matches found and the `files` array would be empty.
+
+# Retrieving the images
+
+Now that we have a list of files we can retrieve them one-by-one. We do this by creating a URL
+that identifies the file to retrieve, making the API call to retrieve the file contents, and writing
+the contents to disk.
+
+To create the correct URL we start with the one defined before and attach the keyword '/files/'
+followed by the ID of each file. Assuming we have a file ID of '111', the final URL for retrieving 
+the file would be: 
+
+``` {sh eval=FALSE}
+https://terraref.ncsa.illinois.edu/clowder/api/files/111
+```
+
+By looping through each of the returned files from the previous example, and using their ID and 
+filename, we can retrieve the files from the server and store them locally. 
+
+We are streaming the data returned from our server request (`stream=True` in the code below) due to 
+the high probability of large file sizes. If the `stream=True` parameter was omitted the file's entire 
+contents would be in the `r` variable which could then be written to the local file.
+
+```{python eval=FALSE}
+# We are using the same `url` and `key` variables declared in the previous example above.
+filesurl = url + '/files/'
+params={ 'key': key }
+
+for f in files:
+  r = requests.get(fileurl +  f.id, params=params, stream=True)
+  with open(f.filename, 'wb') as o:
+        for chunk in r.iter_content(chunk_size=1024): 
+            if chunk:
+                o.write(chunk)
+     
+```
+
+The images are now stored on the local file system.
+
+# Retrieving sensor names
+
+In this section we retrieve the names of different sensor types that are available. This will
+allow you to retrieve files other than those containing RBG image data.
+
+```{python eval=FALSE}
+# We are using the same `url` and `key` variables declared in the previous example above.
+from terrautils.products import get_sensor_list, unique_sensor_names
+
+sensors = get_sensor_list(None, url, key)
+names = unique_sensor_names(sensors)
+```
+
+The variable `names` will now contain the list of all available sensors. Using these sensor
+names it's possible to use the above search to locate and then retrieve additional data files.
+Substitute the new sensor name for 'RGB GeoTIFFs Datasets' where the variable `sensor` is 
+assigned above.
+