Skip to content

Commit 88c0b95

Browse files
authored
Merge pull request #76 from andrewdelman/cloud_compatibility
minor changes to ecco_s3_retrieve and tutorials
2 parents f50da80 + 51f2d0d commit 88c0b95

4 files changed

Lines changed: 48 additions & 31 deletions

File tree

ECCO-ACCESS/Cloud_access_to_ECCO_datasets/Tutorial_AWS_Cloud_getting_started.ipynb

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5105,7 +5105,8 @@
51055105
"text": [
51065106
"Help on function ecco_podaac_s3_get in module ecco_s3_retrieve:\n",
51075107
"\n",
5108-
"ecco_podaac_s3_get(ShortName, StartDate, EndDate, download_root_dir=None, n_workers=6, force_redownload=False, return_downloaded_files=False)\n",
5108+
"ecco_podaac_s3_get(ShortName, StartDate, EndDate, download_root_dir=None, n_workers=6,\n"
5109+
" force_redownload=False, return_downloaded_files=False)\n",
51095110
" This routine downloads ECCO datasets from PO.DAAC, to be stored locally on a AWS EC2 instance running in region us-west-2. \n",
51105111
" It is adapted from the ecco_podaac_download function in the ecco_download.py module, and is the AWS Cloud equivalent of \n",
51115112
" ecco_podaac_download.\n",
@@ -5316,11 +5317,15 @@
53165317
"text": [
53175318
"Help on function ecco_podaac_s3_get_diskaware in module ecco_s3_retrieve:\n",
53185319
"\n",
5319-
"ecco_podaac_s3_get_diskaware(ShortNames, StartDate, EndDate, max_avail_frac=0.5, snapshot_interval=None, download_root_dir=None, n_workers=6, force_redownload=False)\n",
5320-
" This function estimates the storage footprint of ECCO datasets, given ShortName(s), a date range, and which files (if any) are already present.\n",
5321-
" If the current instance's available storage is at least twice the footprint of the new files, they are downloaded and stored locally on the instance \n",
5322-
" using ecco_podaac_s3_get (hosting files locally typically speeds up loading and computation).\n",
5323-
" Otherwise, the files are \"opened\" using ecco_podaac_s3_open so that they can be accessed directly on S3 without occupying local storage.\n",
5320+
"ecco_podaac_s3_get_diskaware(ShortNames, StartDate, EndDate, max_avail_frac=0.5, \n",
5321+
" snapshot_interval=None, download_root_dir=None, n_workers=6, force_redownload=False)\n",
5322+
" This function estimates the storage footprint of ECCO datasets, given ShortName(s), a date range, and which \n",
5323+
" files (if any) are already present.\n",
5324+
" If the footprint of the files to be downloaded (not including files already on the instance or re-downloads) \n",
5325+
" is <= the max_avail_frac specified of the instance's available storage, they are downloaded and stored locally \n",
5326+
" on the instance (hosting files locally typically speeds up loading and computation).\n",
5327+
" Otherwise, the files are "opened" using ecco_podaac_s3_open so that they can be accessed directly \n",
5328+
" on S3 without occupying local storage.\n",
53245329
" \n",
53255330
" Parameters\n",
53265331
" ----------\n",
@@ -5336,11 +5341,14 @@
53365341
" \n",
53375342
" max_avail_frac: float, maximum fraction of remaining available disk space to use in storing current ECCO datasets.\n",
53385343
" This determines whether the dataset files are stored on the current instance, or opened on S3.\n",
5339-
" Valid range is [0,0.9]. If number provided is outside this range, it is replaced by the closer endpoint of the range.\n",
5344+
" Valid range is [0,0.9]. If number provided is outside this range, it is replaced by the closer \n",
5345+
" endpoint of the range.\n",
53405346
" \n",
5341-
" snapshot_interval: ('monthly', 'daily', or None), if snapshot datasets are included in ShortNames, this determines whether\n",
5342-
" snapshots are included for only the beginning/end of each month ('monthly'), or for every day ('daily').\n",
5343-
" If None or not specified, defaults to 'daily' if any daily mean ShortNames are included and 'monthly' otherwise.\n",
5347+
" snapshot_interval: ('monthly', 'daily', or None), if snapshot datasets are included in ShortNames, \n",
5348+
" this determines whether snapshots are included for only the beginning/end of each month \n",
5349+
" ('monthly'), or for every day ('daily').\n",
5350+
" If None or not specified, defaults to 'daily' if any daily mean ShortNames are included \n",
5351+
" and 'monthly' otherwise.\n",
53445352
" \n",
53455353
" download_root_dir: str, defines parent directory to download files to.\n",
53465354
" Files will be downloaded to directory download_root_dir/ShortName/.\n",
@@ -5373,7 +5381,7 @@
53735381
"id": "887f8436-98d3-4b09-a7d1-936810717592",
53745382
"metadata": {},
53755383
"source": [
5376-
"The syntax of this function is similar to `ecco_podaac_s3_get`, but there are two arguments specific to this function: **max_avail_frac** and **snapshot_interval**. **max_avail_frac** sets the storage threshold for whether the specified dataset(s) will be downloaded to the user's instance vs. opened from S3. For example, the default max_avail_frac = 0.5 will download the datasets if they will occupy less than 50% of the instance's remaining available memory. **snapshot_interval** applies only if there are snapshot datasets included in ShortNames, e.g., it could be useful to specify snapshot_interval = 'monthly' if you want to limit the size of the potential download.\n",
5384+
"The syntax of this function is similar to `ecco_podaac_s3_get`, but there are two arguments specific to this function: **max_avail_frac** and **snapshot_interval**. **max_avail_frac** sets the storage threshold for whether the specified dataset(s) will be downloaded to the user's instance vs. opened from S3. For example, the default max_avail_frac = 0.5 will download the datasets if they will occupy <= 50% of the instance's remaining available storage. **snapshot_interval** applies only if there are snapshot datasets included in ShortNames, e.g., it could be useful to specify snapshot_interval = 'monthly' if you want to limit the size of the potential download.\n",
53775385
"\n",
53785386
"Now let's repeat the calculation that was done above by invoking this function, first removing the files if they are already on disk to replicate the previous example in Method 2 as closely as possible."
53795387
]

ECCO-ACCESS/ecco_s3_retrieve.py

Lines changed: 23 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,8 @@ def ecco_podaac_s3_query(ShortName,StartDate,EndDate):
3333
3434
Returns
3535
-------
36-
s3_files_list: str or list, opened file(s) on S3 that can be passed directly to xarray (open_dataset or open_mfdataset)
36+
s3_files_list: str or list, opened file(s) on S3 that can be passed directly to xarray
37+
(open_dataset or open_mfdataset)
3738
3839
"""
3940

@@ -149,7 +150,8 @@ def get_granules(params: dict):
149150
# actually log in with this command:
150151
setup_earthdata_login_auth()
151152

152-
# Query the NASA Common Metadata Repository to find the URL of every granule associated with the desired ECCO Dataset and date range of interest.
153+
# Query the NASA Common Metadata Repository to find the URL of every granule associated with the desired
154+
# ECCO Dataset and date range of interest.
153155

154156
# create a Python dictionary with our search criteria: `ShortName` and `temporal`
155157
input_search_params = {'ShortName': ShortName,
@@ -440,15 +442,18 @@ def ecco_podaac_s3_get(ShortName,StartDate,EndDate,download_root_dir=None,n_work
440442
###================================================================================================================
441443

442444

443-
def ecco_podaac_s3_get_diskaware(ShortNames,StartDate,EndDate,max_avail_frac=0.5,snapshot_interval=None,download_root_dir=None,n_workers=6,\
444-
force_redownload=False):
445+
def ecco_podaac_s3_get_diskaware(ShortNames,StartDate,EndDate,max_avail_frac=0.5,snapshot_interval=None,\
446+
download_root_dir=None,n_workers=6,force_redownload=False):
445447

446448
"""
447449
448-
This function estimates the storage footprint of ECCO datasets, given ShortName(s), a date range, and which files (if any) are already present.
449-
If the current instance's available storage is at least twice the footprint of the new files, they are downloaded and stored locally on the instance
450-
using ecco_podaac_s3_get (hosting files locally typically speeds up loading and computation).
451-
Otherwise, the files are "opened" using ecco_podaac_s3_open so that they can be accessed directly on S3 without occupying local storage.
450+
This function estimates the storage footprint of ECCO datasets, given ShortName(s), a date range, and which
451+
files (if any) are already present.
452+
If the footprint of the files to be downloaded (not including files already on the instance or re-downloads)
453+
is <= the max_avail_frac specified of the instance's available storage, they are downloaded and stored locally
454+
on the instance (hosting files locally typically speeds up loading and computation).
455+
Otherwise, the files are "opened" using ecco_podaac_s3_open so that they can be accessed directly
456+
on S3 without occupying local storage.
452457
453458
Parameters
454459
----------
@@ -464,11 +469,14 @@ def ecco_podaac_s3_get_diskaware(ShortNames,StartDate,EndDate,max_avail_frac=0.5
464469
465470
max_avail_frac: float, maximum fraction of remaining available disk space to use in storing current ECCO datasets.
466471
This determines whether the dataset files are stored on the current instance, or opened on S3.
467-
Valid range is [0,0.9]. If number provided is outside this range, it is replaced by the closer endpoint of the range.
472+
Valid range is [0,0.9]. If number provided is outside this range, it is replaced by the closer
473+
endpoint of the range.
468474
469-
snapshot_interval: ('monthly', 'daily', or None), if snapshot datasets are included in ShortNames, this determines whether
470-
snapshots are included for only the beginning/end of each month ('monthly'), or for every day ('daily').
471-
If None or not specified, defaults to 'daily' if any daily mean ShortNames are included and 'monthly' otherwise.
475+
snapshot_interval: ('monthly', 'daily', or None), if snapshot datasets are included in ShortNames,
476+
this determines whether snapshots are included for only the beginning/end of each month
477+
('monthly'), or for every day ('daily').
478+
If None or not specified, defaults to 'daily' if any daily mean ShortNames are included
479+
and 'monthly' otherwise.
472480
473481
download_root_dir: str, defines parent directory to download files to.
474482
Files will be downloaded to directory download_root_dir/ShortName/.
@@ -484,8 +492,8 @@ def ecco_podaac_s3_get_diskaware(ShortNames,StartDate,EndDate,max_avail_frac=0.5
484492
485493
Returns
486494
-------
487-
retrieved_files: dict, with keys: ShortNames and values: downloaded or opened file(s) with path on local instance or on S3,
488-
that can be passed directly to xarray (open_dataset or open_mfdataset).
495+
retrieved_files: dict, with keys: ShortNames and values: downloaded or opened file(s) with path on local instance
496+
or on S3, that can be passed directly to xarray (open_dataset or open_mfdataset).
489497
490498
"""
491499

@@ -600,4 +608,4 @@ def ecco_podaac_s3_get_diskaware(ShortNames,StartDate,EndDate,max_avail_frac=0.5
600608

601609
retrieved_files[curr_shortname] = open_files
602610

603-
return retrieved_files
611+
return retrieved_files

Tutorials_as_Jupyter_Notebooks/ECCO_v4_Gradient_calc_on_native_grid.ipynb

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -169,9 +169,12 @@
169169
" max_avail_frac=0.5,\\\n",
170170
" download_root_dir=ECCO_dir)\n",
171171
" ecco_grid = xr.open_mfdataset(files_nested_list[ShortNames_list[0]])\n",
172-
" ecco_vars_TS = xr.open_mfdataset(files_nested_list[ShortNames_list[1]],compat='override',data_vars='minimal',coords='minimal')\n",
173-
" ecco_vars_vel = xr.open_mfdataset(files_nested_list[ShortNames_list[2]],compat='override',data_vars='minimal',coords='minimal')\n",
174-
" ecco_vars_atm = xr.open_mfdataset(files_nested_list[ShortNames_list[3]],compat='override',data_vars='minimal',coords='minimal')\n",
172+
" ecco_vars_TS = xr.open_mfdataset(files_nested_list[ShortNames_list[1]],\\\n",
173+
" compat='override',data_vars='minimal',coords='minimal')\n",
174+
" ecco_vars_vel = xr.open_mfdataset(files_nested_list[ShortNames_list[2]],\\\n",
175+
" compat='override',data_vars='minimal',coords='minimal')\n",
176+
" ecco_vars_atm = xr.open_mfdataset(files_nested_list[ShortNames_list[3]],\\\n",
177+
" compat='override',data_vars='minimal',coords='minimal')\n",
175178
"else:\n",
176179
" ecco_grid = xr.open_mfdataset(glob.glob(join(ECCO_dir,ShortNames_list[0],'*.nc'))[0])\n",
177180
" ecco_vars_TS = xr.open_mfdataset(glob.glob(join(ECCO_dir,ShortNames_list[1],'*2000-*.nc')),\\\n",

Tutorials_as_Jupyter_Notebooks/ECCO_v4_data_structure_basics.ipynb

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,6 @@
1515
"\n",
1616
"The ECCO version 4 release 4 (v4r4) files are provided as NetCDF files. This tutorial shows you how to download and open these files using Python code, and takes a look at the structure of these files. The ECCO output is available as a number of **datasets** that each contain a few variables. Each dataset consists of files corresponding to a single time coordinate (monthly mean, daily mean, or snapshot). Each dataset file that represents a single time is called a **granule**.\n",
1717
"\n",
18-
"or alternatively use *wget* to obtain the files.\n",
19-
"\n",
2018
"In this first tutorial we will start slowly, providing detail at every step. Later tutorials will assume knowledge of some basic operations introduced here.\n",
2119
"\n",
2220
"Let's get started.\n",

0 commit comments

Comments
 (0)