|
7 | 7 | "source": [ |
8 | 8 | "# Downloading Subsets of ECCO Datasets\n", |
9 | 9 | "\n", |
10 | | - "Andrew Delman, updated 2023-12-22.\n", |
| 10 | + "Andrew Delman, updated 2024-10-18.\n", |
11 | 11 | "\n", |
12 | | - "The [previous tutorial](https://ecco-v4-python-tutorial.readthedocs.io/Downloading_ECCO_Datasets_from_PODAAC_Python.html) went through the steps needed to download ECCO datasets using Python code, and introduced the [ecco_download](https://ecco-v4-python-tutorial.readthedocs.io/Downloading_ECCO_Datasets_from_PODAAC_Python.html#ECCO_download-module:-the-quick-and-easy-method) module with the useful `ecco_podaac_download` function to download datasets with a single function call.\n", |
| 12 | + "Previous tutorials on the *ecco_access* package introduced [some use cases](https://ecco-v4-python-tutorial.readthedocs.io/ECCO_access_intro.html) and demonstrated the various [access modes](https://ecco-v4-python-tutorial.readthedocs.io/ECCO_access_modes.html). This tutorial goes into more detail on how to use the 'download_subset' mode.\n", |
13 | 13 | "\n", |
14 | | - "But what if you don't want to download the entire global domain of ECCO? The [NASA Earthdata search](https://search.earthdata.nasa.gov/) interface and the [podaac_data_downloader](https://github.com/podaac/data-subscriber/blob/main/Downloader.md) utility both provide lat/lon subsetting, but this can't be used for the native llc90 grid of ECCO files. However, PO.DAAC does also make its datasets available through [OPeNDAP](https://podaac.jpl.nasa.gov/OPeNDAP-in-the-Cloud), and this enables spatial subsetting of the ECCO datasets. A new update to the [ecco_download](https://ecco-v4-python-tutorial.readthedocs.io/Downloading_ECCO_Datasets_from_PODAAC_Python.html#ECCO_download-module:-the-quick-and-easy-method) module includes the `ecco_podaac_download_subset` function which exploits OPeNDAP capabilities so they can be invoked easily from your Python script or notebook. Here are some ways this function can be used to subset ECCO files prior to download, along with possible use cases:\n", |
| 14 | + "But what if you don't want to download the entire global domain of ECCO? The [NASA Earthdata search](https://search.earthdata.nasa.gov/) interface and the [podaac_data_downloader](https://github.com/podaac/data-subscriber/blob/main/Downloader.md) utility both provide lat/lon subsetting, but this can't be used for the native llc90 grid of ECCO files. However, PO.DAAC does also make its datasets available through [OPeNDAP](https://podaac.jpl.nasa.gov/OPeNDAP-in-the-Cloud), and this enables spatial subsetting of the ECCO datasets. The access modes 'download_subset' used in the `ecco_access` libraries exploits OPeNDAP capabilities so that datasets can be subsetted prior to downloading, with a function call from your Python script or notebook. Here are some ways this can be used to subset ECCO files prior to download, along with possible use cases:\n", |
15 | 15 | "\n", |
16 | 16 | "\\- Regional subsetting (e.g., budget analyses that span many time granules but only a single tile or 2 adjacent tiles)\n", |
17 | 17 | "\n", |
|
21 | 21 | "\n", |
22 | 22 | "\\- Time subsetting in non-continuous ranges (e.g., downloading boreal summer files from multiple years)\n", |
23 | 23 | "\n", |
24 | | - "> Currently the `ecco_download` module is a [standalone download](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ecco_access/ecco_download.py). However, we hope to include it in the `ecco_v4_py` package soon so that it does not need to be downloaded or imported into your workspace separately. Stay tuned!\n", |
25 | 24 | "\n", |
26 | 25 | "## Getting Started\n", |
27 | 26 | "\n", |
28 | | - "Before using the `ecco_download` module, you need your NASA Earthdata login credentials in your local `netrc` file--if you don't yet, follow the steps [here](https://ecco-v4-python-tutorial.readthedocs.io/Downloading_ECCO_Datasets_from_PODAAC_Python.html#Earthdata-Login-Requirements). \n", |
| 27 | + "Before using `ecco_access`, you need to [make it accessible to your Python path](https://ecco-v4-python-tutorial.readthedocs.io/ECCO_access_intro.html#Setting-up-ecco_access). You will also need to have your NASA Earthdata login credentials in your local `netrc` file--if you don't yet, follow the steps [here](https://ecco-v4-python-tutorial.readthedocs.io/ECCO_access_intro.html#Setting-up-Earthdata-login-credentials).\n", |
29 | 28 | "\n", |
30 | | - "Let's look at the syntax of the `ecco_podaac_download_subset` function:" |
| 29 | + ">**Note**: The parameters that are used for subsetting with mode = 'download_subset' are the same as those that are used with the function `ecco_podaac_download_subset`. The help documentation displayed below for `ecco_podaac_download_subset` provides a list of parameters that can also be passed to `ecco_podaac_access` and `ecco_podaac_to_xrdataset` when mode = 'download_subset' is used.\n", |
| 30 | + "\n", |
| 31 | + "Let's look at the syntax of the `ecco_podaac_download_subset` function (which is invoked with mode = 'download_subset'):" |
31 | 32 | ] |
32 | 33 | }, |
33 | 34 | { |
|
148 | 149 | } |
149 | 150 | ], |
150 | 151 | "source": [ |
151 | | - "from ecco_download import *\n", |
| 152 | + "import numpy as np\n", |
| 153 | + "import xarray as xr\n", |
| 154 | + "from os.path import join,expanduser\n", |
| 155 | + "\n", |
| 156 | + "import ecco_access as ea\n", |
152 | 157 | "\n", |
153 | | - "help(ecco_podaac_download_subset)" |
| 158 | + "help(ea.ecco_podaac_download_subset)" |
154 | 159 | ] |
155 | 160 | }, |
156 | 161 | { |
157 | 162 | "cell_type": "markdown", |
158 | 163 | "id": "4af72661", |
159 | 164 | "metadata": {}, |
160 | 165 | "source": [ |
161 | | - "There are a lot of options with this function! If you have used the `ecco_podaac_download` function, you'll notice the first few options are the same; most importantly, we need to provide a StartDate, EndDate, and ShortName every time the function is called, otherwise it will return an error. The ShortName of each ECCO dataset along with the associated variables and brief descriptions can be found [here](https://ecco-v4-python-tutorial.readthedocs.io/Downloading_ECCO_Datasets_from_PODAAC_Python.html#Dataset-ShortNames-and-variables-associated-with-them).\n", |
| 166 | + "There are a lot of options in this mode! The ShortName of each ECCO dataset along with the associated variables and brief descriptions can be found [here](https://ecco-v4-python-tutorial.readthedocs.io/Downloading_ECCO_Datasets_from_PODAAC_Python.html#Dataset-ShortNames-and-variables-associated-with-them). If you are instead using one of the top-level *ecco_access* functions (`ecco_podaac_access` and `ecco_podaac_to_xrdataset`) then you can also enter a query that searches the [ECCO variable lists](#Dataset-ShortNames-and-variables-associated-with-them) to help you find the dataset that you want.\n", |
162 | 167 | "\n", |
163 | | - "A few use cases are probably the best way to see what this function can do, so let's try some.\n", |
| 168 | + "A few use cases are probably the best way to see what `ecco_podaac_download_subset` and mode = 'download_subset' can do, so let's try some.\n", |
164 | 169 | "\n", |
165 | 170 | "\n", |
166 | 171 | "## Example 1: Downloading monthly SSH in the North Atlantic\n", |
|
191 | 196 | } |
192 | 197 | ], |
193 | 198 | "source": [ |
194 | | - "ecco_podaac_download(ShortName='ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4',StartDate='2000-01',EndDate='2000-12')" |
| 199 | + "user_home_dir = expanduser('~')\n", |
| 200 | + "# change download_root_dir as desired\n", |
| 201 | + "download_root_dir = join(user_home_dir,'Downloads','ECCO_V4r4_PODAAC')\n", |
| 202 | + "\n", |
| 203 | + "SSH_mon_shortname = 'ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4'\n", |
| 204 | + "files_dict = ea.ecco_podaac_access(SSH_mon_shortname,\\\n", |
| 205 | + " StartDate='2000-01',EndDate='2000-12',\\\n", |
| 206 | + " mode='download',\\\n", |
| 207 | + " download_root_dir=download_root_dir)" |
195 | 208 | ] |
196 | 209 | }, |
197 | 210 | { |
|
1674 | 1687 | } |
1675 | 1688 | ], |
1676 | 1689 | "source": [ |
1677 | | - "import xarray as xr\n", |
1678 | | - "from os.path import join,expanduser\n", |
1679 | | - "\n", |
1680 | | - "ds_SSH_mon_2000 = xr.open_mfdataset(join(expanduser('~'),'Downloads','ECCO_V4r4_PODAAC',\\\n", |
1681 | | - " 'ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4',\\\n", |
1682 | | - " '*2000*.nc'))\n", |
| 1690 | + "ds_SSH_mon_2000 = xr.open_mfdataset(files_dict[SSH_mon_shortname],\\\n", |
| 1691 | + " parallel=True,\\\n", |
| 1692 | + " compat='override',data_vars='minimal',coords='minimal')\n", |
1683 | 1693 | "ds_SSH_mon_2000" |
1684 | 1694 | ] |
1685 | 1695 | }, |
|
1688 | 1698 | "id": "b79909cf", |
1689 | 1699 | "metadata": {}, |
1690 | 1700 | "source": [ |
1691 | | - "Note that there are four data variables in these files, but perhaps we only need one, the \"dynamic sea surface height anomaly\" (`SSH`). The function `ecco_podaac_download_subset` can be used to download only that data variable (along with the dimension and coordinate information).\n", |
| 1701 | + "Note that there are four data variables in these files, but perhaps we only need one, the \"dynamic sea surface height anomaly\" (`SSH`). The function `ecco_podaac_download_subset` (and mode = 'download_subset') can be used to download only that data variable (along with the dimension and coordinate information).\n", |
1692 | 1702 | "\n", |
1693 | 1703 | "Furthermore, we only need to look at one region, the North Atlantic. So most likely we don't need the entire 13-tile global domain of ECCO--but what tiles do we need? Let's use a simple function to find out. *Note: you need the ECCO native grid file downloaded for the script below; if you don't have it downloaded yet, use the code commented out at the top.*" |
1694 | 1704 | ] |
|
1708 | 1718 | } |
1709 | 1719 | ], |
1710 | 1720 | "source": [ |
1711 | | - "# # Download ECCO native grid file\n", |
1712 | | - "# ecco_podaac_download(ShortName='ECCO_L4_GEOMETRY_LLC0090GRID_V4R4',StartDate='1992-01-01',EndDate='2017-12-31')\n", |
1713 | | - "\n", |
1714 | | - "import numpy as np\n", |
1715 | | - "import xarray as xr\n", |
1716 | | - "from os.path import join,expanduser\n", |
1717 | | - "\n", |
1718 | | - "# assumes grid file is in directory ~/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_GEOMETRY_LLC0090GRID_V4R4/\n", |
1719 | | - "# change if your grid file location is different\n", |
1720 | | - "grid_file_path = join(expanduser('~'),'Downloads','ECCO_V4r4_PODAAC',\\\n", |
1721 | | - " 'ECCO_L4_GEOMETRY_LLC0090GRID_V4R4',\\\n", |
1722 | | - " 'GRID_GEOMETRY_ECCO_V4r4_native_llc0090.nc')\n", |
1723 | | - "ds_grid = xr.open_dataset(grid_file_path)\n", |
| 1721 | + "# load grid file\n", |
| 1722 | + "grid_shortname = 'ECCO_L4_GEOMETRY_LLC0090GRID_V4R4'\n", |
| 1723 | + "ds_grid = xr.ecco_podaac_to_xrdataset(grid_shortname,\\\n", |
| 1724 | + " mode='download',\\\n", |
| 1725 | + " download_root_dir=download_root_dir).compute()\n", |
1724 | 1726 | "\n", |
1725 | 1727 | "# find llc90 tiles in given bounding box\n", |
1726 | 1728 | "def llc90_tiles_find(ds_grid,latsouth,latnorth,longwest,longeast):\n", |
|
1750 | 1752 | "id": "c4693f6c", |
1751 | 1753 | "metadata": {}, |
1752 | 1754 | "source": [ |
1753 | | - "Seeing that the identified region is contained in tiles 2 and 10, we only need to download those two tiles. Let's repeat the SSH download above using `ecco_podaac_download_subset` to select for the data variable `SSH` and the tiles 2 and 10. \n", |
| 1755 | + "Seeing that the identified region is contained in tiles 2 and 10, we only need to download those two tiles. Let's repeat the SSH download above using mode = `download_subset` to select for the data variable `SSH` and the tiles 2 and 10. \n", |
1754 | 1756 | "\n", |
1755 | 1757 | "> Because of OPeNDAP syntax we need to express the selected tiles as a range \\[2,13,8\\], with a \"start\" of 2 and a \"stride\" of 8; the \"end\" can be any integer greater than 10, but no larger than 18." |
1756 | 1758 | ] |
|
1781 | 1783 | } |
1782 | 1784 | ], |
1783 | 1785 | "source": [ |
1784 | | - "ecco_podaac_download_subset(ShortName='ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4',\\\n", |
1785 | | - " StartDate='2000-01',EndDate='2000-12',\\\n", |
1786 | | - " vars_to_include=['SSH'],\\\n", |
1787 | | - " tile_isel=[2,13,8],\\\n", |
1788 | | - " subset_file_id='SSHonly_NAtl')" |
| 1786 | + "# subsetting prior to download with mode = 'download_subset'\n", |
| 1787 | + "files_dict = ea.ecco_podaac_access(SSH_mon_shortname,\\\n", |
| 1788 | + " StartDate='2000-01',EndDate='2000-12',\\\n", |
| 1789 | + " mode='download_subset',\\\n", |
| 1790 | + " vars_to_include=['SSH'],\\\n", |
| 1791 | + " tile_isel=[2,13,8],\\\n", |
| 1792 | + " subset_file_id='SSHonly_NAtl')" |
1789 | 1793 | ] |
1790 | 1794 | }, |
1791 | 1795 | { |
|
2943 | 2947 | } |
2944 | 2948 | ], |
2945 | 2949 | "source": [ |
2946 | | - "ds_SSH_mon_2000_sub = xr.open_mfdataset(join(expanduser('~'),'Downloads','ECCO_V4r4_PODAAC',\\\n", |
2947 | | - " 'ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4',\\\n", |
2948 | | - " '*2000*SSHonly_NAtl.nc'))\n", |
| 2950 | + "ds_SSH_mon_2000_sub = xr.open_mfdataset(files_dict[SSH_mon_shortname],\\\n", |
| 2951 | + " parallel=True,\\\n", |
| 2952 | + " compat='override',data_vars='minimal',coords='minimal')\n", |
2949 | 2953 | "ds_SSH_mon_2000_sub" |
2950 | 2954 | ] |
2951 | 2955 | }, |
|
3285 | 3289 | "from os.path import join,expanduser\n", |
3286 | 3290 | "import matplotlib.pyplot as plt\n", |
3287 | 3291 | "\n", |
| 3292 | + "import ecco_access as ea\n", |
| 3293 | + "\n", |
3288 | 3294 | "\n", |
3289 | 3295 | "# find llc90 tiles and indices in given bounding box\n", |
3290 | 3296 | "def llc90_tiles_indices_find(ds_grid,latsouth,latnorth,longwest,longeast):\n", |
|
3336 | 3342 | } |
3337 | 3343 | ], |
3338 | 3344 | "source": [ |
3339 | | - "ecco_podaac_download_subset(ShortName='ECCO_L4_TEMP_SALINITY_LLC0090GRID_DAILY_V4R4',\\\n", |
3340 | | - " vars_to_include=['THETA'],\\\n", |
3341 | | - " times_to_include=['2004-08','2004-09','2004-10',\\\n", |
3342 | | - " '2005-08','2005-09','2005-10',\\\n", |
3343 | | - " '2006-08','2006-09','2006-10'],\\\n", |
3344 | | - " k_isel=[0,1,1],\\\n", |
3345 | | - " tile_isel=[10,11,1],\\\n", |
3346 | | - " j_isel=[28,48,1],\\\n", |
3347 | | - " i_isel=[66,82,1],\\\n", |
3348 | | - " subset_file_id='SST_GoM')" |
| 3345 | + "# use ecco_podaac_to_xrdataset to download requested data,\n", |
| 3346 | + "# and open it in the workspace as an xarray Dataset\n", |
| 3347 | + "ds_SST_GoM = ea.ecco_podaac_to_xrdataset('ECCO_L4_TEMP_SALINITY_LLC0090GRID_DAILY_V4R4',\\\n", |
| 3348 | + " vars_to_include=['THETA'],\\\n", |
| 3349 | + " times_to_include=['2004-08','2004-09','2004-10',\\\n", |
| 3350 | + " '2005-08','2005-09','2005-10',\\\n", |
| 3351 | + " '2006-08','2006-09','2006-10'],\\\n", |
| 3352 | + " k_isel=[0,1,1],\\\n", |
| 3353 | + " tile_isel=[10,11,1],\\\n", |
| 3354 | + " j_isel=[28,48,1],\\\n", |
| 3355 | + " i_isel=[66,82,1],\\\n", |
| 3356 | + " subset_file_id='SST_GoM')" |
3349 | 3357 | ] |
3350 | 3358 | }, |
3351 | 3359 | { |
|
4107 | 4115 | } |
4108 | 4116 | ], |
4109 | 4117 | "source": [ |
4110 | | - "ds_SST_GoM = xr.open_mfdataset(join(expanduser('~'),'Downloads','ECCO_V4r4_PODAAC',\\\n", |
4111 | | - " 'ECCO_L4_TEMP_SALINITY_LLC0090GRID_DAILY_V4R4',\\\n", |
4112 | | - " '*SST_GoM.nc'),\\\n", |
4113 | | - " compat='override',data_vars='minimal',coords='minimal')\n", |
4114 | | - " # the last three options are recommended for merging a large number of individual files\n", |
4115 | | - "\n", |
4116 | 4118 | "ds_SST_GoM = ds_SST_GoM.compute() # .compute() loads the dataset into workspace memory\n", |
4117 | 4119 | "ds_SST_GoM" |
4118 | 4120 | ] |
|
4358 | 4360 | "import xarray as xr\n", |
4359 | 4361 | "from os.path import join,expanduser\n", |
4360 | 4362 | "\n", |
| 4363 | + "import ecco_access as ea\n", |
| 4364 | + "\n", |
| 4365 | + "\n", |
4361 | 4366 | "# assumes grid file is in directory ~/Downloads/ECCO_V4r4_PODAAC/ECCO_L4_GEOMETRY_LLC0090GRID_V4R4/\n", |
4362 | 4367 | "# change if your grid file location is different\n", |
4363 | 4368 | "grid_file_path = join(expanduser('~'),'Downloads','ECCO_V4r4_PODAAC',\\\n", |
|
4430 | 4435 | } |
4431 | 4436 | ], |
4432 | 4437 | "source": [ |
4433 | | - "ecco_podaac_download_subset(ShortName='ECCO_L4_OCEAN_3D_SALINITY_FLUX_LLC0090GRID_MONTHLY_V4R4',\\\n", |
| 4438 | + "ea.ecco_podaac_access('ECCO_L4_OCEAN_3D_SALINITY_FLUX_LLC0090GRID_MONTHLY_V4R4',\\\n", |
4434 | 4439 | " StartDate='1992',EndDate='2017',\\\n", |
4435 | 4440 | " k_isel=[0,17,1],\\\n", |
4436 | 4441 | " tile_isel=[8,13,3],\\\n", |
4437 | 4442 | " download_or_list='list',\\\n", |
4438 | 4443 | " list_filename='TPac_salbudget_download.txt',\\\n", |
4439 | | - " subset_file_id='TPac')" |
| 4444 | + " subset_file_id='TPac',\\\n", |
| 4445 | + " return_granules=False)" |
4440 | 4446 | ] |
4441 | 4447 | }, |
4442 | 4448 | { |
4443 | 4449 | "cell_type": "markdown", |
4444 | 4450 | "id": "fb2daad0", |
4445 | 4451 | "metadata": {}, |
4446 | 4452 | "source": [ |
4447 | | - "Note that if the file specified by `list_filename` already exists, the file is not overwritten; the download URLs are just appended to the end of the list. This is helpful for putting URLs from multiple `ecco_podaac_download_subset` requests in a single text file. Then the files can be downloaded in a single call to `wget`, e.g., using the shell script [wget_download_fromlist.sh](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ecco_access/Downloading_ECCO_datasets_from_PODAAC/wget_download_fromlist.sh)." |
| 4453 | + "Note that if the file specified by `list_filename` already exists, the file is not overwritten; the download URLs are just appended to the end of the list. This is helpful for putting URLs from multiple requests in a single text file. Then the files can be downloaded in a single call to `wget`, e.g., using the shell script [wget_download_fromlist.sh](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ecco_access/Downloading_ECCO_datasets_from_PODAAC/wget_download_fromlist.sh)." |
4448 | 4454 | ] |
4449 | 4455 | } |
4450 | 4456 | ], |
|
4464 | 4470 | "name": "python", |
4465 | 4471 | "nbconvert_exporter": "python", |
4466 | 4472 | "pygments_lexer": "ipython3", |
4467 | | - "version": "3.9.13" |
| 4473 | + "version": "3.11.9" |
4468 | 4474 | } |
4469 | 4475 | }, |
4470 | 4476 | "nbformat": 4, |
|
0 commit comments