Skip to content

Commit f50da80

Browse files
authored
Merge pull request #75 from andrewdelman/cloud_compatibility
Cloud compatibility updates
2 parents b93df31 + b727bce commit f50da80

9 files changed

Lines changed: 4796 additions & 1872 deletions

Cloud_Setup/JPL_setup_instructions.md

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ As a JPL user, more than likely you will be added as a user to an existing proje
88

99
Once you are a user on a JPL AWS account, make sure you are connected to the JPL network (with VPN if not at the lab), and go to the JPL AWS console [sign-in page](https://sso3.jpl.nasa.gov/awsconsole). Then bookmark the sign-in page, as you will be using it again and it is not the easiest to find. Once you have signed in, you should be at a screen with the title Console Home. First, let's make sure you are in the most optimal AWS "region" for accessing PO.DAAC datasets, which are hosted in region *us-west-2 (Oregon)*. In the upper-right corner of the page just to the left of your username, there is a drop-down menu with a place name on it. Select the **US West (Oregon) us-west-2** region.
1010

11-
Now let's start a new EC2 instance. We will need to do this using an Amazon Machine Image (AMI) generated by the [JPL Cloud Computing Team](https://wiki.jpl.nasa.gov/display/cloudcomputing/OS+Pipeline). Click on **Services** in the upper-left corner next to the AWS logo, then **Compute** --> **EC2**, then from the menu on the left **Images** --> **AMIs**. A list of JPL-specific AMIs should appear on the screen (if not make sure **Private images** is selected as a filter on the top left). It is recommended to use a recently-generated JPL AMI, as these AMIs are automatically deprecated after 2 years. Use the arrows next to **AMI name** or **Creation date** to see the newest AMIs first. Select an AMI and click **Launch instance from AMI** in the upper-right corner. There are some settings on this screen to configure before launching the new instance:
11+
Now let's start a new EC2 instance. We will need to do this using an Amazon Machine Image (AMI) generated by the JPL Cloud Computing Team (see [here](https://wiki.jpl.nasa.gov/display/cloudcomputing/OS+Pipeline) for more info). In the AWS console, click on **Services** in the upper-left corner next to the AWS logo, then **Compute** --> **EC2**, then from the menu on the left **Images** --> **AMIs**. A list of JPL-specific AMIs should appear on the screen (if not make sure **Private images** is selected as a filter on the top left). It is recommended to use a recently-generated JPL AMI, as these AMIs are automatically deprecated after 2 years. Use the arrows next to **AMI name** or **Creation date** to see the newest AMIs first. Select an AMI and click **Launch instance from AMI** in the upper-right corner. There are some settings on this screen to configure before launching the new instance:
1212

1313
*Name and tags*: Whatever you want (e.g., ECCO tutorials).
1414

@@ -20,7 +20,7 @@ Now let's start a new EC2 instance. We will need to do this using an Amazon Mach
2020

2121
*Network settings*: Look at **Select existing security group** to see if you can use a security group that has VPC: vpc-0161fa19cefbd9635. If not, you can try **Create security group** and make sure that the boxes for allowing SSH, HTTPS, and HTTP traffic are checked. If you have issues launching or accessing your instance, you may need to consult with another JPL user or submit a ticket to [CloudHelp](https://goto.jpl.nasa.gov/cloudhelp).
2222

23-
*Configure storage*: Specify a storage volume with at least **15 GiB gp3** as your root volume. This is important, since the python/conda installation with the packages we need will occupy ~7.5 GB, and we need some workspace as a buffer. If you are in Free tier then you can request up to 30 GB across all your instances, so you can use up the full amount in a single instance or split it across two instances with 15 GB each.
23+
*Configure storage*: Specify a storage volume with at least **16 GiB gp3** as your root volume. This is important, since the python/conda installation with the packages we need will occupy ~7.5 GB, and we need some workspace as a buffer. If you get an error message about having too little storage when you launch your instance, you need to edit your instance config to have at least the minimum amount of storage for the AMI you are using.
2424

2525
*Advanced details*: You need to include an IAM profile with your instance. Check the *IAM instance profile* dropdown menu to see if there is one associated with your security group (might have a title like **SRV-standard-instance-profile**). If you can not select an IAM profile, check with other account users or [CloudHelp](https://goto.jpl.nasa.gov/cloudhelp).
2626

@@ -35,7 +35,7 @@ JPL does not enable `ssh` access to AWS instances by default, instead preferring
3535
- *Initial set up and download GitHub repository*: Copy the following commands and paste in your SSM window (using shift-insert or right-click then **Paste**):
3636

3737
```
38-
cd ~ && sudo dnf update -y && sudo dnf install git -y && git clone https://github.com/andrewdelman/ECCO-v4-Python-Tutorial-adelman.git
38+
cd ~ && sudo dnf update -y && sudo dnf install git -y && git clone https://github.com/ECCO-GROUP/ECCO-v4-Python-Tutorial.git
3939
```
4040

4141
- *Enable ssh access*: There is a script in the GitHub repository to enable ssh access `sshd_enable.sh`. You want to run it as the *root* user, otherwise you will not have the necessary permissions. Again, copy and paste the following in your SSM window:
@@ -46,7 +46,7 @@ sudo ~/ECCO-v4-Python-Tutorial/Cloud_Setup/sshd_enable.sh
4646

4747
The script will ask if you want to move the git repo and change its ownership. Answer **Y** and enter **jpluser** for user name.
4848

49-
Once the script is completed, you should be able to ssh into your new instance. You can close the SSM window and from your machine's terminal window, connect to the instance's *private* IPv4 address (given on the AWS instance summary page) with user name **jpluser**. For example, if the private IPv4 address is 100.104.70.37, then:
49+
Once the script is completed, you should be able to ssh into your new instance. You can **Terminate** the SSM window. Then from your machine's terminal window, connect to the instance's *private* IPv4 address (given on the AWS instance summary page) with user name **jpluser**. For example, if the key file is `~/.ssh/aws_ec2_jupyter.pem` and the private IPv4 address is 100.104.70.37, then:
5050

5151
```
5252
ssh -i "~/.ssh/aws_ec2_jupyter.pem" jpluser@100.104.70.37 -L 9889:localhost:9889
@@ -56,9 +56,21 @@ The `-L` option indicates a tunnel from the local machine's port 9889 to the ins
5656

5757
### Step 3b: Set up conda environment
5858

59-
Now you need to install software (conda/miniconda/miniforge) to run Python, and then install Python packages and the Jupyter interface to run these tutorial notebooks. A shell script to expedite this process is provided on the tutorial Github page, and here we will walk through setting this up.
59+
Now you need to install software (conda/miniconda/miniforge) to run Python, and then install Python packages and the Jupyter interface to run these tutorial notebooks. A shell script to expedite this process `jupyter_env_setup.sh` is provided on the tutorial Github page. This script handles most of our environment setup, by doing the following:
6060

61-
Now we will execute a shell script that will set up a conda environment called `jupyter`, and allow the user to input their NASA Earthdata username and password (which are written to the `~/.netrc` file on the instance). Copy, paste, and execute the following two commands on your instance:
61+
1. Installing `wget` (which allows us to download from internet websites)
62+
63+
1. Installing `tmux` (which allows us to persist tasks on a remote machine even when disconnected).
64+
65+
1. Downloading `Miniforge.sh` from *conda-forge* which enables us to install `conda` and `mamba` (a faster, C-based `conda`) in the `/tmp` directory.
66+
67+
1. Creating a new conda environment called `jupyter` that will contain the packages we need to run the notebooks.
68+
69+
1. Installing Python packages using a combination of `mamba` and `pip` (the latter works better when memory is limited).
70+
71+
1. Querying the user for their NASA Earthdata username and password (if these are already archived in a `~/.netrc` file this step is skipped).
72+
73+
To run `jupyter_env_setup.sh`, copy, paste, and execute the following two commands on the instance:
6274

6375
```
6476
sudo chmod 755 ~/ECCO-v4-Python-Tutorial/Cloud_Setup/jupyter_env_setup.sh && ~/ECCO-v4-Python-Tutorial/Cloud_Setup/jupyter_env_setup.sh

Cloud_Setup/jupyter_lab_start.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ tmux send-keys -t jupyterlab ${jlab_start} Enter
2727
# Print info about tmux session
2828
echo -e "${red_start}Started Jupyter lab in tmux session jupyterlab"
2929
echo -e "${red_start}Access from your local machine in a browser window at"
30-
echo -e "${blue_start}http://127.0.0.1:9889/"
30+
echo -e "${blue_start}http://127.0.0.1:9889/ ${red_start}or ${blue_start}http://localhost:9889/"
3131
echo -e "${red_start}tmux session can be accessed with"
3232
echo -e "${blue_start}tmux a -t jupyterlab"
3333
echo -e "${red_start}and detached from current window by pressing keys"

Cloud_Setup/sshd_enable.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ echo '$ ssh -i "~/.ssh/your_key_pair.pem" jpluser@private_ip_address'
6464

6565
# move git repo to ssh user's directory and change ownership (if requested)
6666
read -p 'Move ECCO-v4-Python-Tutorial repo to different user? (Y/[N]) ' move_opt
67-
if [ $move_opt == "Y"] || [ $move_opt == "y" ]; then
67+
if [ $move_opt == "Y" ] || [ $move_opt == "y" ]; then
6868
read -p 'User name of new owner [jpluser for JPL]: ' ssh_user
6969
cd /home
7070
mv ./ssm-user/ECCO-v4-Python-Tutorial ./${ssh_user}/

ECCO-ACCESS/Cloud_access_to_ECCO_datasets/Tutorial_AWS_Cloud_getting_started.ipynb

Lines changed: 3143 additions & 388 deletions
Large diffs are not rendered by default.

ECCO-ACCESS/ecco_download.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ def ecco_podaac_download(ShortName,StartDate,EndDate,download_root_dir=None,n_wo
2020
ECCOv4r4 date range is '1992-01-01' to '2017-12-31'.
2121
For 'SNAPSHOT' datasets, an additional day is added to EndDate to enable closed budgets
2222
within the specified date range.
23+
24+
download_root_dir: str, defines parent directory to download files to.
25+
Files will be downloaded to directory download_root_dir/ShortName/.
26+
If not specified, parent directory defaults to '~/Downloads/ECCO_V4r4_PODAAC/'.
2327
2428
n_workers: int, number of workers to use in concurrent downloads. Benefits typically taper off above 5-6.
2529
@@ -168,6 +172,7 @@ def download_files_concurrently(dls, download_dir, n_workers, force=False):
168172
print(f'total downloaded: {np.round(total_download_size_in_bytes/1e6,2)} Mb')
169173
print(f'avg download speed: {np.round(total_download_size_in_bytes/1e6/total_time_download,2)} Mb/s')
170174
print('Time spent = ' + str(total_time_download) + ' seconds')
175+
print('\n')
171176

172177
# return list of downloaded files
173178
downloaded_files = []
@@ -309,6 +314,7 @@ def download_files_concurrently(dls, download_dir, n_workers, force=False):
309314
print(f'total downloaded: {np.round(total_download_size_in_bytes/1e6,2)} Mb')
310315
print(f'avg download speed: {np.round(total_download_size_in_bytes/1e6/total_time_download,2)} Mb/s')
311316
print('Time spent = ' + str(total_time_download) + ' seconds')
317+
print('\n')
312318

313319
if return_downloaded_files == True:
314320
if len(downloaded_files) == 1:
@@ -694,7 +700,7 @@ def download_file(url: str, output_file: str, force: bool=False):
694700
# if the file has already been downloaded, skip
695701
if isfile(output_file) and force is False:
696702
print(output_filename + ' already exists, and force=False, not re-downloading')
697-
return 0
703+
return output_file,0
698704

699705
with requests.get(url) as r:
700706
if not r.status_code // 100 == 2:
@@ -1007,7 +1013,8 @@ def download_wrapper(url: str, url_append: str, download_dir: str, subset_file_i
10071013
print('\n=====================================')
10081014
print(f'total downloaded: {np.round(total_download_size_in_bytes/1e6,2)} Mb')
10091015
print(f'avg download speed: {np.round(total_download_size_in_bytes/1e6/total_time_download,2)} Mb/s')
1010-
print('Time spent = ' + str(total_time_download) + ' seconds')
1016+
print('Time spent = ' + str(total_time_download) + ' seconds')
1017+
print('\n')
10111018

10121019
# Display dates of granules that were not downloaded successfully
10131020
status_codes_bad = (status_codes < 0).nonzero()[0]

0 commit comments

Comments
 (0)