ECCO-GROUP
diff --git a/‎Cloud_Setup/JPL_setup_instructions.md‎
Lines changed: 18 additions & 6 deletions b/‎Cloud_Setup/JPL_setup_instructions.md‎
Lines changed: 18 additions & 6 deletions
diff --git a/‎Cloud_Setup/jupyter_lab_start.sh‎
Lines changed: 1 addition & 1 deletion b/‎Cloud_Setup/jupyter_lab_start.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎Cloud_Setup/sshd_enable.sh‎
Lines changed: 1 addition & 1 deletion b/‎Cloud_Setup/sshd_enable.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎ECCO-ACCESS/Cloud_access_to_ECCO_datasets/Tutorial_AWS_Cloud_getting_started.ipynb‎
Lines changed: 3143 additions & 388 deletions b/‎ECCO-ACCESS/Cloud_access_to_ECCO_datasets/Tutorial_AWS_Cloud_getting_started.ipynb‎
Lines changed: 3143 additions & 388 deletions
diff --git a/‎ECCO-ACCESS/ecco_download.py‎
Lines changed: 9 additions & 2 deletions b/‎ECCO-ACCESS/ecco_download.py‎
Lines changed: 9 additions & 2 deletions
@@ -8,7 +8,7 @@ As a JPL user, more than likely you will be added as a user to an existing proje
 
 Once you are a user on a JPL AWS account, make sure you are connected to the JPL network (with VPN if not at the lab), and go to the JPL AWS console [sign-in page](https://sso3.jpl.nasa.gov/awsconsole). Then bookmark the sign-in page, as you will be using it again and it is not the easiest to find. Once you have signed in, you should be at a screen with the title Console Home. First, let's make sure you are in the most optimal AWS "region" for accessing PO.DAAC datasets, which are hosted in region *us-west-2 (Oregon)*. In the upper-right corner of the page just to the left of your username, there is a drop-down menu with a place name on it. Select the **US West (Oregon)    us-west-2** region.
 
-Now let's start a new EC2 instance. We will need to do this using an Amazon Machine Image (AMI) generated by the [JPL Cloud Computing Team](https://wiki.jpl.nasa.gov/display/cloudcomputing/OS+Pipeline). Click on **Services** in the upper-left corner next to the AWS logo, then **Compute** --> **EC2**, then from the menu on the left **Images** --> **AMIs**. A list of JPL-specific AMIs should appear on the screen (if not make sure **Private images** is selected as a filter on the top left). It is recommended to use a recently-generated JPL AMI, as these AMIs are automatically deprecated after 2 years. Use the arrows next to **AMI name** or **Creation date** to see the newest AMIs first. Select an AMI and click **Launch instance from AMI** in the upper-right corner. There are some settings on this screen to configure before launching the new instance:
+Now let's start a new EC2 instance. We will need to do this using an Amazon Machine Image (AMI) generated by the JPL Cloud Computing Team (see [here](https://wiki.jpl.nasa.gov/display/cloudcomputing/OS+Pipeline) for more info). In the AWS console, click on **Services** in the upper-left corner next to the AWS logo, then **Compute** --> **EC2**, then from the menu on the left **Images** --> **AMIs**. A list of JPL-specific AMIs should appear on the screen (if not make sure **Private images** is selected as a filter on the top left). It is recommended to use a recently-generated JPL AMI, as these AMIs are automatically deprecated after 2 years. Use the arrows next to **AMI name** or **Creation date** to see the newest AMIs first. Select an AMI and click **Launch instance from AMI** in the upper-right corner. There are some settings on this screen to configure before launching the new instance:
 
 *Name and tags*: Whatever you want (e.g., ECCO tutorials).
 
@@ -20,7 +20,7 @@ Now let's start a new EC2 instance. We will need to do this using an Amazon Mach
 
 *Network settings*: Look at **Select existing security group** to see if you can use a security group that has VPC: vpc-0161fa19cefbd9635. If not, you can try **Create security group** and make sure that the boxes for allowing SSH, HTTPS, and HTTP traffic are checked. If you have issues launching or accessing your instance, you may need to consult with another JPL user or submit a ticket to [CloudHelp](https://goto.jpl.nasa.gov/cloudhelp).
 
-*Configure storage*: Specify a storage volume with at least **15 GiB gp3** as your root volume. This is important, since the python/conda installation with the packages we need will occupy ~7.5 GB, and we need some workspace as a buffer. If you are in Free tier then you can request up to 30 GB across all your instances, so you can use up the full amount in a single instance or split it across two instances with 15 GB each.
+*Configure storage*: Specify a storage volume with at least **16 GiB gp3** as your root volume. This is important, since the python/conda installation with the packages we need will occupy ~7.5 GB, and we need some workspace as a buffer. If you get an error message about having too little storage when you launch your instance, you need to edit your instance config to have at least the minimum amount of storage for the AMI you are using.
 
 *Advanced details*: You need to include an IAM profile with your instance. Check the *IAM instance profile* dropdown menu to see if there is one associated with your security group (might have a title like **SRV-standard-instance-profile**). If you can not select an IAM profile, check with other account users or [CloudHelp](https://goto.jpl.nasa.gov/cloudhelp).
 
@@ -35,7 +35,7 @@ JPL does not enable `ssh` access to AWS instances by default, instead preferring
 - *Initial set up and download GitHub repository*: Copy the following commands and paste in your SSM window (using shift-insert or right-click then **Paste**):
 
 ```
-cd ~ && sudo dnf update -y && sudo dnf install git -y && git clone https://github.com/andrewdelman/ECCO-v4-Python-Tutorial-adelman.git
+cd ~ && sudo dnf update -y && sudo dnf install git -y && git clone https://github.com/ECCO-GROUP/ECCO-v4-Python-Tutorial.git
 ```
 
 - *Enable ssh access*: There is a script in the GitHub repository to enable ssh access `sshd_enable.sh`. You want to run it as the *root* user, otherwise you will not have the necessary permissions. Again, copy and paste the following in your SSM window:
@@ -46,7 +46,7 @@ sudo ~/ECCO-v4-Python-Tutorial/Cloud_Setup/sshd_enable.sh
 
 The script will ask if you want to move the git repo and change its ownership. Answer **Y** and enter **jpluser** for user name.
 
-Once the script is completed, you should be able to ssh into your new instance. You can close the SSM window and from your machine's terminal window, connect to the instance's *private* IPv4 address (given on the AWS instance summary page) with user name **jpluser**. For example, if the private IPv4 address is 100.104.70.37, then:
+Once the script is completed, you should be able to ssh into your new instance. You can **Terminate** the SSM window. Then from your machine's terminal window, connect to the instance's *private* IPv4 address (given on the AWS instance summary page) with user name **jpluser**. For example, if the key file is `~/.ssh/aws_ec2_jupyter.pem` and the private IPv4 address is 100.104.70.37, then:
 
 ```
 ssh -i "~/.ssh/aws_ec2_jupyter.pem" jpluser@100.104.70.37 -L 9889:localhost:9889
@@ -56,9 +56,21 @@ The `-L` option indicates a tunnel from the local machine's port 9889 to the ins
 
 ### Step 3b: Set up conda environment
 
-Now you need to install software (conda/miniconda/miniforge) to run Python, and then install Python packages and the Jupyter interface to run these tutorial notebooks. A shell script to expedite this process is provided on the tutorial Github page, and here we will walk through setting this up.
+Now you need to install software (conda/miniconda/miniforge) to run Python, and then install Python packages and the Jupyter interface to run these tutorial notebooks. A shell script to expedite this process `jupyter_env_setup.sh` is provided on the tutorial Github page. This script handles most of our environment setup, by doing the following:
 
-Now we will execute a shell script that will set up a conda environment called `jupyter`, and allow the user to input their NASA Earthdata username and password (which are written to the `~/.netrc` file on the instance). Copy, paste, and execute the following two commands on your instance:
+1. Installing `wget` (which allows us to download from internet websites)
+
+1. Installing `tmux` (which allows us to persist tasks on a remote machine even when disconnected).
+
+1. Downloading `Miniforge.sh` from *conda-forge* which enables us to install `conda` and `mamba` (a faster, C-based `conda`) in the `/tmp` directory.
+
+1. Creating a new conda environment called `jupyter` that will contain the packages we need to run the notebooks.
+
+1. Installing Python packages using a combination of `mamba` and `pip` (the latter works better when memory is limited).
+
+1. Querying the user for their NASA Earthdata username and password (if these are already archived in a `~/.netrc` file this step is skipped).
+
+To run `jupyter_env_setup.sh`, copy, paste, and execute the following two commands on the instance:
 
 ```
 sudo chmod 755 ~/ECCO-v4-Python-Tutorial/Cloud_Setup/jupyter_env_setup.sh && ~/ECCO-v4-Python-Tutorial/Cloud_Setup/jupyter_env_setup.sh
 
@@ -27,7 +27,7 @@ tmux send-keys -t jupyterlab ${jlab_start} Enter
 # Print info about tmux session
 echo -e "${red_start}Started Jupyter lab in tmux session jupyterlab"
 echo -e "${red_start}Access from your local machine in a browser window at"
-echo -e "${blue_start}http://127.0.0.1:9889/"
+echo -e "${blue_start}http://127.0.0.1:9889/ ${red_start}or ${blue_start}http://localhost:9889/"
 echo -e "${red_start}tmux session can be accessed with"
 echo -e "${blue_start}tmux a -t jupyterlab"
 echo -e "${red_start}and detached from current window by pressing keys"
 
@@ -64,7 +64,7 @@ echo '$ ssh -i "~/.ssh/your_key_pair.pem" jpluser@private_ip_address'
 
 # move git repo to ssh user's directory and change ownership (if requested)
 read -p 'Move ECCO-v4-Python-Tutorial repo to different user? (Y/[N]) ' move_opt
-if [ $move_opt == "Y"] || [ $move_opt == "y" ]; then
+if [ $move_opt == "Y" ] || [ $move_opt == "y" ]; then
     read -p 'User name of new owner [jpluser for JPL]: ' ssh_user
     cd /home
     mv ./ssm-user/ECCO-v4-Python-Tutorial ./${ssh_user}/
 
@@ -20,6 +20,10 @@ def ecco_podaac_download(ShortName,StartDate,EndDate,download_root_dir=None,n_wo
                        ECCOv4r4 date range is '1992-01-01' to '2017-12-31'.
                        For 'SNAPSHOT' datasets, an additional day is added to EndDate to enable closed budgets
                        within the specified date range.
+
+    download_root_dir: str, defines parent directory to download files to.
+                       Files will be downloaded to directory download_root_dir/ShortName/.
+                       If not specified, parent directory defaults to '~/Downloads/ECCO_V4r4_PODAAC/'.
     
     n_workers: int, number of workers to use in concurrent downloads. Benefits typically taper off above 5-6.
     
@@ -168,6 +172,7 @@ def download_files_concurrently(dls, download_dir, n_workers, force=False):
             print(f'total downloaded: {np.round(total_download_size_in_bytes/1e6,2)} Mb')
             print(f'avg download speed: {np.round(total_download_size_in_bytes/1e6/total_time_download,2)} Mb/s')
             print('Time spent = ' + str(total_time_download) + ' seconds')
+            print('\n')
 
             # return list of downloaded files
             downloaded_files = []
@@ -309,6 +314,7 @@ def download_files_concurrently(dls, download_dir, n_workers, force=False):
         print(f'total downloaded: {np.round(total_download_size_in_bytes/1e6,2)} Mb')
         print(f'avg download speed: {np.round(total_download_size_in_bytes/1e6/total_time_download,2)} Mb/s')
         print('Time spent = ' + str(total_time_download) + ' seconds')
+        print('\n')
 
     if return_downloaded_files == True:
         if len(downloaded_files) == 1:
@@ -694,7 +700,7 @@ def download_file(url: str, output_file: str, force: bool=False):
         # if the file has already been downloaded, skip    
         if isfile(output_file) and force is False:
             print(output_filename + ' already exists, and force=False, not re-downloading')
-            return 0
+            return output_file,0
 
         with requests.get(url) as r:
             if not r.status_code // 100 == 2: 
@@ -1007,7 +1013,8 @@ def download_wrapper(url: str, url_append: str, download_dir: str, subset_file_i
         print('\n=====================================')
         print(f'total downloaded: {np.round(total_download_size_in_bytes/1e6,2)} Mb')
         print(f'avg download speed: {np.round(total_download_size_in_bytes/1e6/total_time_download,2)} Mb/s')
-        print('Time spent = ' + str(total_time_download) + ' seconds')        
+        print('Time spent = ' + str(total_time_download) + ' seconds')
+        print('\n')
 
         # Display dates of granules that were not downloaded successfully
         status_codes_bad = (status_codes < 0).nonzero()[0]