adding some additional information regarding xarray and annotated data

kbarnhart · kbarnhart · commit b0f05ebeffa5 · 2018-05-08T08:58:53.000-06:00
diff --git a/lithology/introduction_to_lithology.ipynb b/lithology/introduction_to_lithology.ipynb
@@ -17,6 +17,8 @@
     "\n",
     "In this tutorial we will first use the LithoLayers to erode either dipping layeres or an anticline. Then we will use Lithology to create inverted topography. \n",
     "\n",
+    "We will also use [xarray](https://xarray.pydata.org/en/stable/) to store and annotate our model output. \n",
+    "\n",
     "To start, we will import the necessary modules. A note: this tutorial uses the [HoloViews package](http://holoviews.org) for visualization. This package is a great tool for dealing with multidimentional annotated data (e.g. an xarray dataset). If you get an error on import, consider updating dask (this is what the author needed to do in April 2018). You will also need to have the [Bokeh](https://bokeh.pydata.org/en/latest/) and [Matplotlib](https://matplotlib.org) packages installed."
    ]
   },
@@ -115,9 +117,7 @@
     "\n",
     "Next, lets instantiate a FlowAccumulator and a FastscapeEroder to create a simple landscape evolution model. \n",
     "\n",
-    "We will point the FastscapeEroder to the model grid field `'K_sp'` so that it will respond to the spatially variable erodabilities created by LithoLayers. \n",
-    "\n",
-    "We will also instatiate an xarray dataset used to store the output of our model through time for visualization. "
+    "We will point the FastscapeEroder to the model grid field `'K_sp'` so that it will respond to the spatially variable erodabilities created by LithoLayers. "
    ]
   },
   {
@@ -133,39 +133,95 @@
     "dt = 1000\n",
     "\n",
     "fa = FlowAccumulator(mg)\n",
-    "sp = FastscapeEroder(mg, K_sp='K_sp')\n",
+    "sp = FastscapeEroder(mg, K_sp='K_sp')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Before we run the model we will also instatiate an xarray dataset used to store the output of our model through time for visualization. \n",
     "\n",
-    "out_fields = ['topographic__elevation',\n",
-    "              'rock_type__id']\n",
+    "The next block may look intimidating, but I'll try and walk you through what it does. \n",
+    "\n",
+    "[xarray](https://xarray.pydata.org/en/stable/) allows us to create a container for our data and label it with information like units, dimensions, short and long names, etc.  xarray gives all the tools for dealing with N-dimentional data provided by python packages such as [numpy](http://www.numpy.org), the labeling and named indexing power of the [pandas](https://pandas.pydata.org) package, and the data-model of the [NetCDF file](https://www.unidata.ucar.edu/software/netcdf/).\n",
     "\n",
-    "ds = xr.Dataset(data_vars={'topographic__elevation' : (('time', 'y', 'x'), \n",
-    "                                                       np.empty((nts, mg.shape[0], mg.shape[1])),\n",
-    "                                                      {'units' : 'meters',\n",
+    "This means that we can use xarray to make a \"self-referetial\" dataset that contains all of the variables and attributes that describe what each part is and how it was made. In this application, we won't make a fully self-referential dataset, but if you are interested in this, check out the [NetCDF best practices](https://www.unidata.ucar.edu/software/netcdf/docs/BestPractices.html). \n",
+    "\n",
+    "Important for our application is that later on we will use the [HoloViews package](http://holoviews.org) for visualization. This package is a great tool for dealing with multidimentional annotated data and will do things like automatically create nice axis labels with units. However, in order for it to work, we must first annotate our data to include this information.\n",
+    "\n",
+    "Here we create an xarray Dataset with two variables `'topographic__elevation'` and `'rock_type__id'` and three dimensions `'x'`, `'y'`, and `'time'`. \n",
+    "\n",
+    "We pass xarray two dictionaries, one with information about the data variabiables (`data_vars`) and one with information about the coordinate system (`coords`). For each data variable or coordinate, we pass a tuple of three items: `(dims, data, atts)`. The first element is a tuple of the name of the dimensions, the second element is the data, an the third is a dictionary of attributes. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "ds = xr.Dataset(data_vars={'topographic__elevation' : (('time', 'y', 'x'),  # tuple of dimensions\n",
+    "                                                       np.empty((nts, mg.shape[0], mg.shape[1])), # n-d array of data\n",
+    "                                                      {'units' : 'meters', # dictionary with data attributes\n",
     "                                                       'long_name': 'Topographic Elevation'}),\n",
     "                           'rock_type__id': (('time', 'y', 'x'), \n",
     "                                             np.empty((nts, mg.shape[0], mg.shape[1])),\n",
     "                                            {'units' : '-',\n",
-    "                                             'long_name' : 'Rock Type ID Code'})\n",
-    "                          },\n",
-    "                coords={'x': (('x'), \n",
-    "                              mg.x_of_node.reshape(mg.shape)[0,:],\n",
-    "                              {'units' : 'meters'}),\n",
+    "                                             'long_name' : 'Rock Type ID Code'})},\n",
+    "                coords={'x': (('x'), # tuple of dimensions\n",
+    "                              mg.x_of_node.reshape(mg.shape)[0,:], # 1-d array of coordinate data\n",
+    "                              {'units' : 'meters'}), # dictionary with data attributes\n",
     "                        'y': (('y'), \n",
     "                              mg.y_of_node.reshape(mg.shape)[:, 1],\n",
     "                              {'units' : 'meters'}),\n",
     "                        'time': (('time'),  \n",
     "                                 dt*np.arange(nts)/1e6,\n",
     "                                 {'units': 'millions of years since model start',\n",
-    "                                  'standard_name' : 'time'})\n",
-    "                       }\n",
-    "                )\n"
+    "                                  'standard_name' : 'time'})})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can print the data set to get some basic information about it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(ds)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Here we run the model. In each time step we first run the FlowAccumulator to direct flow and accumulatate drainage area. Then the FastscapeEroder erodes the topography based on the stream power equation using the erodability value in the field`'K_sp'`. We create an uplift field that uplifts only the model grid's core nodes. After uplifting these core nodes, we update LithoLayers. Importantly, we must tell the LithoLayers how it has been advected upward by uplift. \n",
+    "We can also print a single variable to get more detailed information about it. \n",
+    "\n",
+    "Since we initialized the datset with empty arrays for the two data variables, we just see zeros for the data values. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ds.topographic__elevation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we run the model. In each time step we first run the FlowAccumulator to direct flow and accumulatate drainage area. Then the FastscapeEroder erodes the topography based on the stream power equation using the erodability value in the field`'K_sp'`. We create an uplift field that uplifts only the model grid's core nodes. After uplifting these core nodes, we update LithoLayers. Importantly, we must tell the LithoLayers how it has been advected upward by uplift. \n",
     "\n",
     "`lith.run_one_step` has an optional argument `rock_id` to use when some material may be deposited. Since we are using the FastscapeEroder which is fully detachment limited, we don't need to set this. \n",
     "\n",
@@ -180,6 +236,9 @@
    },
    "outputs": [],
    "source": [
+    "out_fields = ['topographic__elevation',\n",
+    "              'rock_type__id']\n",
+    "\n",
     "for i in range(nts):\n",
     "    fa.run_one_step()\n",
     "    sp.run_one_step(dt = dt)\n",
@@ -202,7 +261,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "imshow_grid(mg, 'topographic__elevation', cmap='viridis')"
@@ -222,7 +283,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "hvds_topo = hv.Dataset(ds.topographic__elevation)\n",
@@ -242,7 +305,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "%opts Image style(interpolation='bilinear', cmap='viridis') plot[colorbar=True]\n",
@@ -362,7 +427,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "imshow_grid(mg2, 'topographic__elevation', cmap='viridis')"
@@ -380,7 +447,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "volcanic_deposits = np.zeros(mg2.size('node'))\n",
@@ -436,6 +505,7 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
+    "collapsed": true,
     "scrolled": true
    },
    "outputs": [],
@@ -453,7 +523,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "hvds_topo2 = hv.Dataset(ds2.topographic__elevation)\n",