Skip to content

Commit bb38954

Browse files
authored
Merge pull request #78 from andrewdelman/cloud_compatibility
In-cloud access tutorial updates
2 parents 2fafadd + 6617999 commit bb38954

28 files changed

Lines changed: 13473 additions & 5908 deletions

Cloud_Setup/jupyter_env_setup.sh

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -29,29 +29,22 @@ echo -e "${red_start}Installed wget${nocolor_start}"
2929
sudo dnf install tmux -y
3030
echo -e "${red_start}Installed tmux${nocolor_start}"
3131

32-
# retrieve and install miniforge in /tmp/
33-
# assuming EBS volume is already attached to instance
32+
# retrieve and install miniforge
3433
echo -e "${red_start}Starting Miniforge3 installation${nocolor_start}"
3534
mkdir -p /tmp
3635
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh" -O /tmp/Miniforge3.sh
37-
bash /tmp/Miniforge3.sh -b -p /tmp/conda
36+
bash /tmp/Miniforge3.sh -b -p ~/conda
3837
rm -f /tmp/Miniforge.sh
39-
source "/tmp/conda/etc/profile.d/conda.sh"
40-
source "/tmp/conda/etc/profile.d/mamba.sh"
38+
source ~/conda/bin/activate
4139

4240
echo -e "${red_start}Completed Miniforge3 installation${nocolor_start}"
4341

4442
# add conda and mamba to path
4543
mamba init
4644

47-
# set paths to environment and package directories
48-
printf '\n# set conda environment and package directories' >> ~/.bashrc
49-
printf '\nexport CONDA_ENVS_PATH=/tmp/conda/envs' >> ~/.bashrc
50-
printf '\nexport CONDA_PKGS_DIRS=/tmp/conda/pkgs' >> ~/.bashrc
51-
source ~/.bashrc
45+
# # set paths to environment and package directories
5246

53-
# create jupyter environment under /tmp/conda/envs/
54-
# (in EBS storage to save space in home directory)
47+
# create jupyter environment
5548
mamba create --name jupyter python=3.11 -y
5649
echo -e "${red_start}Created jupyter environment${nocolor_start}"
5750

@@ -94,6 +87,7 @@ mamba install notebook -y
9487
mamba install progressbar -y
9588
mamba install gsw -y
9689
mamba install nco -y
90+
mamba install pympler -y
9791

9892
# install remaining packages using pip
9993
# (mamba installs tend to get killed on t2.micro)
@@ -106,6 +100,7 @@ pip install ecco_v4_py
106100

107101
echo -e "${red_start}Completed Python package installations${nocolor_start}"
108102

103+
109104
echo -e "${red_start}Setting up NASA Earthdata authentication${nocolor_start}"
110105
# NASA Earthdata authentication
111106
# check if credentials are already archived in ~/.netrc, and if not then prompt the user for them

Cloud_Setup/jupyter_lab_start.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ red_start='\033[0;31m'
77
blue_start='\033[0;34m'
88
nocolor_start='\033[0m'
99

10-
source /tmp/conda/bin/activate
10+
source ~/conda/bin/activate
1111
conda activate jupyter
1212

1313
# Start configuration for Jupyter lab
@@ -20,7 +20,7 @@ jlab_start="jupyter Space lab Space --no-browser Space --autoreload Space --port
2020
tmux new -d -s jupyterlab
2121

2222
# Execute commands in tmux window using send-keys
23-
tmux send-keys -t jupyterlab source Space /tmp/conda/bin/activate Enter
23+
tmux send-keys -t jupyterlab source Space ~/conda/bin/activate Enter
2424
tmux send-keys -t jupyterlab conda Space activate Space jupyter Enter
2525
tmux send-keys -t jupyterlab ${jlab_start} Enter
2626

ECCO-ACCESS/Downloading_ECCO_datasets_from_PODAAC/Tutorial_Python3_Downloading_ECCO_Subsets.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
"\n",
2222
"\\- Time subsetting in non-continuous ranges (e.g., downloading boreal summer files from multiple years)\n",
2323
"\n",
24-
"> Currently the `ecco_download` module is a [standalone download](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ECCO-ACCESS/Downloading_ECCO_datasets_from_PODAAC/ecco_download.py). However, we hope to include it in the `ecco_v4_py` package soon so that it does not need to be downloaded or imported into your workspace separately. Stay tuned!\n",
24+
"> Currently the `ecco_download` module is a [standalone download](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ECCO-ACCESS/ecco_download.py). However, we hope to include it in the `ecco_v4_py` package soon so that it does not need to be downloaded or imported into your workspace separately. Stay tuned!\n",
2525
"\n",
2626
"## Getting Started\n",
2727
"\n",

ECCO-ACCESS/Downloading_ECCO_datasets_from_PODAAC/Tutorial_Python3_Jupyter_Notebook_Downloading_ECCO_Datasets_from_PODAAC.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,15 @@
77
"source": [
88
"# Using Python to Download ECCO Datasets\n",
99
"\n",
10-
"**Note: This notebook was modified by Andrew Delman (updated 2023-12-22) from the tutorial on the** [ECCO-GROUP Github](https://github.com/ECCO-GROUP/ECCO-ACCESS/blob/master/PODAAC/Downloading_ECCO_datasets_from_PODAAC/Tutorial_Python3_Jupyter_Notebook_Downloading_ECCO_Datasets_from_PODAAC.ipynb) **by Jack McNelis and Ian Fenty, Version 1.1 dated 2021-06-25.**\n",
10+
"**Note: This notebook was modified by Andrew Delman (updated 2024-04-04) from the tutorial on the** [ECCO-GROUP Github](https://github.com/ECCO-GROUP/ECCO-ACCESS/blob/master/PODAAC/Downloading_ECCO_datasets_from_PODAAC/Tutorial_Python3_Jupyter_Notebook_Downloading_ECCO_Datasets_from_PODAAC.ipynb) **by Jack McNelis and Ian Fenty, Version 1.1 dated 2021-06-25.**\n",
1111
"\n",
1212
"This Jupyter notebook provides instructions and Python code for downloading a set of granules (files) for an ECCO \"Dataset\" hosted by PO.DAAC. The focus is on downloading datasets in the lat-lon-cap 90 (llc90) native grid of the ECCO v4 simulations, since the tutorials mostly use output on the native grid. If you're new to this grid geometry, don't worry! The ecco_v4_py package discussed in the previous tutorial will help you load the ECCO output, make computations, and plot the results while hardly needing to interact with the model grid.\n",
1313
"\n",
1414
"The example ECCO Dataset used in this tutorial is \"ECCO Sea Surface Height - Daily Mean llc90 Grid (Version 4 Release 4)\" which provides daily mean sea surface height on the native llc90 grid ([10.5067/ECL5D-SSH44](https://doi.org/10.5067/ECL5D-SSH44)).\n",
1515
"\n",
1616
"These data can also be accessed directly through [NASA Earthdata search](https://search.earthdata.nasa.gov/search?fpj=ECCO). You will need to set up a NASA Earthdata account if you do not have one already. There is [a nice graphical interface](https://www.ecco-group.org/datasets.htm) to sort through the ECCO datasets available from PO.DAAC.\n",
1717
"\n",
18-
"> Tip: if you are already familiar with Python and ECCO output, and have edited your `netrc` file as described [below](#Earthdata-Login-Requirements), you can download the [ECCO_download](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ECCO-ACCESS/Downloading_ECCO_datasets_from_PODAAC/ecco_download.py) module. Then import it to your code using `from ecco_download import *` and call the function `ecco_podaac_download` to start downloading. You will need to know the ShortName of the dataset you want, which you can look up using the variable lists [here](https://github.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/tree/master/varlist). To see the syntax of the `ecco_podaac_download` function use `help(ecco_podaac_download)`, or see the end of this tutorial for an example.\n",
18+
"> Tip: if you are already familiar with Python and ECCO output, and have edited your `netrc` file as described [below](#Earthdata-Login-Requirements), you can download the [ECCO_download](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ECCO-ACCESS/ecco_download.py) module. Then import it to your code using `from ecco_download import *` and call the function `ecco_podaac_download` to start downloading. You will need to know the ShortName of the dataset you want, which you can look up using the variable lists [here](https://github.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/tree/master/varlist). To see the syntax of the `ecco_podaac_download` function use `help(ecco_podaac_download)`, or see the end of this tutorial for an example.\n",
1919
"\n",
2020
"\n",
2121
"## Getting Started\n",
@@ -938,7 +938,7 @@
938938
"\n",
939939
"If you've made it this far, that means you can now download and plot any available ECCOv4r4 variable on your local machine. Woohoo! But to make it easier in the future, you can also download the following Python module that runs the downloading routines contained in this notebook.\n",
940940
"\n",
941-
"[ecco_download module](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ECCO-ACCESS/Downloading_ECCO_datasets_from_PODAAC/ecco_download.py)\n",
941+
"[ecco_download module](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ECCO-ACCESS/ecco_download.py)\n",
942942
"\n",
943943
"You can save this file either in the same directory where you store the tutorial notebooks, or a different directory that you then add to your path using sys.path.append. Then you can download using the `ecco_podaac_download` function. To see the syntax of how this is used, let's invoke the module to download daily SSH data for the week 2000-01-08 to 2000-01-14:"
944944
]

ECCO-ACCESS/ecco_s3_retrieve.py

Lines changed: 40 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -86,19 +86,37 @@ def get_results(params: dict, headers: dict=None):
8686
headers=headers).json()
8787
return response
8888

89-
def get_granules(params: dict):
90-
response = get_results(params=params)
91-
if 'feed' in response.keys():
92-
s3_files_list = []
93-
for curr_entry in response['feed']['entry']:
94-
for curr_link in curr_entry['links']:
95-
if "direct download access via S3" in curr_link['title']:
96-
s3_files_list.append(curr_link['href'])
97-
break
98-
elif 'errors' in response.keys():
99-
raise Exception(response['errors'][0])
89+
def get_granules(params: dict, ShortName: str, SingleDay_flag: bool):
90+
time_start = np.array([]).astype('datetime64[ns]')
91+
s3_files_list = []
92+
completed_query = False
93+
while completed_query == False:
94+
response = get_results(params=params)
95+
if 'feed' in response.keys():
96+
for curr_entry in response['feed']['entry']:
97+
time_start = np.append(time_start,np.datetime64(curr_entry['time_start'],'ns'))
98+
for curr_link in curr_entry['links']:
99+
if "direct download access via S3" in curr_link['title']:
100+
s3_files_list.append(curr_link['href'])
101+
break
102+
elif 'errors' in response.keys():
103+
raise Exception(response['errors'][0])
104+
105+
if len(response['feed']['entry']) < 2000:
106+
completed_query = True
107+
else:
108+
# do another CMR search since previous search hit the allowed maximum
109+
# number of entries (2000)
110+
params['temporal'] = str(np.datetime64(response['feed']['entry'][-1]['time_end'],'D')\
111+
+ np.timedelta64(1,'D'))+params['temporal'][10:]
112+
113+
# reduce granule list to single day if only one day in requested range
114+
if (('MONTHLY' in ShortName) or ('DAILY' in ShortName)):
115+
if ((SingleDay_flag == True) and (len(s3_files_list) > 1)):
116+
day_index = np.argmin(np.abs(time_start - np.datetime64(StartDate,'D')))
117+
s3_files_list = s3_files_list[day_index:(day_index+1)]
100118

101-
return s3_files_list
119+
return s3_files_list
102120

103121

104122
# # Adjust StartDate and EndDate to CMR query values
@@ -130,12 +148,12 @@ def get_granules(params: dict):
130148
+'Program will exit now !\n')
131149

132150

133-
# for monthly and daily datasets, do not include the month or day before
151+
SingleDay_flag = False
134152
if (('MONTHLY' in ShortName) or ('DAILY' in ShortName)):
135153
if np.datetime64(EndDate,'D') - np.datetime64(StartDate,'D') \
136154
> np.timedelta64(1,'D'):
155+
# for monthly and daily datasets, do not include the month or day before
137156
StartDate = str(np.datetime64(StartDate,'D') + np.timedelta64(1,'D'))
138-
SingleDay_flag = False
139157
else:
140158
# for single day ranges we need to make the adjustment
141159
# after the CMR request
@@ -162,8 +180,9 @@ def get_granules(params: dict):
162180
### Query CMR for the desired ECCO Dataset
163181

164182
# grans means 'granules', PO.DAAC's term for individual files in a dataset
165-
s3_files_list = get_granules(input_search_params)
166-
183+
s3_files_list = get_granules(input_search_params,ShortName,SingleDay_flag)
184+
185+
167186
return s3_files_list
168187

169188

@@ -499,8 +518,9 @@ def ecco_podaac_s3_get_diskaware(ShortNames,StartDate,EndDate,max_avail_frac=0.5
499518

500519
pass
501520

502-
import shutil
503-
521+
import shutil
522+
523+
504524
# force max_avail_frac to be within limits [0,0.9]
505525
max_avail_frac = np.fmin(np.fmax(max_avail_frac,0),0.9)
506526

@@ -529,9 +549,10 @@ def ecco_podaac_s3_get_diskaware(ShortNames,StartDate,EndDate,max_avail_frac=0.5
529549

530550
# for snapshot datasets with monthly snapshot_interval, only include snapshots at beginning/end of months
531551
if (('SNAPSHOT' in curr_shortname) and (snapshot_interval == 'monthly')):
552+
import re
532553
s3_files_list_copy = list(tuple(s3_files_list))
533554
for s3_file in s3_files_list:
534-
snapshot_date = re.findall("_[0-9]{4}-[0-9]{2}-[0-9]{2}",url)[0][1:]
555+
snapshot_date = re.findall("_[0-9]{4}-[0-9]{2}-[0-9]{2}",s3_file)[0][1:]
535556
if snapshot_date[8:] != '01':
536557
s3_files_list_copy.remove(s3_file)
537558
s3_files_list = s3_files_list_copy

0 commit comments

Comments
 (0)