Skip to content

Commit fd46c28

Browse files
authored
Merge pull request #71 from andrewdelman/cloud_s3_tutorial
Cloud S3 tutorial
2 parents 868fe50 + bd9d31c commit fd46c28

14 files changed

Lines changed: 5173 additions & 1192 deletions
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# JPL setup for AWS EC2 instances
2+
3+
If you are based at JPL and setting up an AWS EC2 instance, there are some steps you need to take to successfully set up the instance to comply with JPL's security requirements and enable `ssh` access on your instance (which is no longer enabled by default). Please follow these steps in place of Steps 2 and 3 of the [AWS Cloud: getting started](https://ecco-v4-python-tutorial.readthedocs.io/AWS_Cloud_getting_started.html) for general users.
4+
5+
## Step 2: Start a JPL EC2 instance
6+
7+
As a JPL user, more than likely you will be added as a user to an existing project AWS account rather than creating a new account. The account owner will need to [add you](https://wiki.jpl.nasa.gov/display/cloudcomputing/Granting+access+to+the+AWS+console+to+other+JPL+users) to the account as a `power_user`. Make sure the account owner/manager is OK with your use of it, as you *will* incur costs running your EC2 instance (free tier accounts do not have sufficient memory for JPL EC2 instances).
8+
9+
Once you are a user on a JPL AWS account, make sure you are connected to the JPL network (with VPN if not at the lab), and go to the JPL AWS console [sign-in page](https://sso3.jpl.nasa.gov/awsconsole). Then bookmark the sign-in page, as you will be using it again and it is not the easiest to find. Once you have signed in, you should be at a screen with the title Console Home. First, let's make sure you are in the most optimal AWS "region" for accessing PO.DAAC datasets, which are hosted in region *us-west-2 (Oregon)*. In the upper-right corner of the page just to the left of your username, there is a drop-down menu with a place name on it. Select the **US West (Oregon) us-west-2** region.
10+
11+
Now let's start a new EC2 instance. We will need to do this using an Amazon Machine Image (AMI) generated by the [JPL Cloud Computing Team](https://wiki.jpl.nasa.gov/display/cloudcomputing/OS+Pipeline). Click on **Services** in the upper-left corner next to the AWS logo, then **Compute** --> **EC2**, then from the menu on the left **Images** --> **AMIs**. A list of JPL-specific AMIs should appear on the screen (if not make sure **Private images** is selected as a filter on the top left). It is recommended to use a recently-generated JPL AMI, as these AMIs are automatically deprecated after 2 years. Use the arrows next to **AMI name** or **Creation date** to see the newest AMIs first. Select an AMI and click **Launch instance from AMI** in the upper-right corner. There are some settings on this screen to configure before launching the new instance:
12+
13+
*Name and tags*: Whatever you want (e.g., ECCO tutorials).
14+
15+
*Application and OS images (Amazon Machine Image)*: Leave unchanged.
16+
17+
*Instance type*: **t2.medium**/**t3.medium** or larger is recommended, and probably necessary to run a JPL-based EC2 instance successfully. (**t3** is a newer generation, with similar or slightly cheaper costs as **t2**.)
18+
19+
*Key pair (login)*: Click on **Create new key pair**. In the pop-up window, make the name whatever you want (e.g., aws_ec2_jupyter), select *Key pair type*: **RSA** and *Private key file format*: **.pem**, then **Create key pair**. This downloads the key file to your Downloads folder, and you should move it to your `.ssh` folder: `mv ~/Downloads/aws_ec2_jupyter.pem ~/.ssh/`. Then change the permissions to read-only for the file owner `chmod 400 ~/.ssh/aws_ec2_jupyter.pem`.
20+
21+
*Network settings*: Look at **Select existing security group** to see if you can use a security group that has VPC: vpc-0161fa19cefbd9635. If not, you can try **Create security group** and make sure that the boxes for allowing SSH, HTTPS, and HTTP traffic are checked. If you have issues launching or accessing your instance, you may need to consult with another JPL user or submit a ticket to [CloudHelp](https://goto.jpl.nasa.gov/cloudhelp).
22+
23+
*Configure storage*: Specify a storage volume with at least **15 GiB gp3** as your root volume. This is important, since the python/conda installation with the packages we need will occupy ~7.5 GB, and we need some workspace as a buffer. If you are in Free tier then you can request up to 30 GB across all your instances, so you can use up the full amount in a single instance or split it across two instances with 15 GB each.
24+
25+
*Advanced details*: You need to include an IAM profile with your instance. Check the *IAM instance profile* dropdown menu to see if there is one associated with your security group (might have a title like **SRV-standard-instance-profile**). If you can not select an IAM profile, check with other account users or [CloudHelp](https://goto.jpl.nasa.gov/cloudhelp).
26+
27+
Finally, at the bottom-right of the page click the yellow **Launch instance** button. Wait a minute or two for the instance to initialize; you can check the **Instances** screen accessed from the menu on the left side to see that your Instance state is **Running**.
28+
29+
### Step 3a: Enable ssh access
30+
31+
JPL does not enable `ssh` access to AWS instances by default, instead preferring [SSM Agent](https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-agent.html). However, most users will be much more familiar with `ssh`, and it is easier to transfer files to/from your instance with `ssh`. You can enable `ssh` using these steps:
32+
33+
- *Connect using SSM in the browser*: Go to **EC2** --> **Instances** and click on the instance ID of the new instance. Then click **Connect** in the upper-right part of the page. There are a few options for connecting, select **Session Manager** and then click the yellow **Connect** button. If you can not connect and/or see an error message, you might need to wait several minutes for the session to be fully established. A tab or window should open in your browser with a terminal window on the instance.
34+
35+
- *Initial set up and download GitHub repository*: Copy the following commands and paste in your SSM window (using shift-insert or right-click then **Paste**):
36+
37+
```
38+
cd ~ && sudo dnf update -y && sudo dnf install git -y && git clone https://github.com/andrewdelman/ECCO-v4-Python-Tutorial-adelman.git
39+
```
40+
41+
- *Enable ssh access*: There is a script in the GitHub repository to enable ssh access `sshd_enable.sh`. You want to run it as the *root* user, otherwise you will not have the necessary permissions. Again, copy and paste the following in your SSM window:
42+
43+
```
44+
sudo ~/ECCO-v4-Python-Tutorial/Cloud_Setup/sshd_enable.sh
45+
```
46+
47+
The script will ask if you want to move the git repo and change its ownership. Answer **Y** and enter **jpluser** for user name.
48+
49+
Once the script is completed, you should be able to ssh into your new instance. You can close the SSM window and from your machine's terminal window, connect to the instance's *private* IPv4 address (given on the AWS instance summary page) with user name **jpluser**. For example, if the private IPv4 address is 100.104.70.37, then:
50+
51+
```
52+
ssh -i "~/.ssh/aws_ec2_jupyter.pem" jpluser@100.104.70.37 -L 9889:localhost:9889
53+
```
54+
55+
The `-L` option indicates a tunnel from the local machine's port 9889 to the instance's port 9889; this will be used later to open Jupyterlab through your local machine's web browser.
56+
57+
### Step 3b: Set up conda environment
58+
59+
Now you need to install software (conda/miniconda/miniforge) to run Python, and then install Python packages and the Jupyter interface to run these tutorial notebooks. A shell script to expedite this process is provided on the tutorial Github page, and here we will walk through setting this up.
60+
61+
Now we will execute a shell script that will set up a conda environment called `jupyter`, and allow the user to input their NASA Earthdata username and password (which are written to the `~/.netrc` file on the instance). Copy, paste, and execute the following two commands on your instance:
62+
63+
```
64+
sudo chmod 755 ~/ECCO-v4-Python-Tutorial/Cloud_Setup/jupyter_env_setup.sh && ~/ECCO-v4-Python-Tutorial/Cloud_Setup/jupyter_env_setup.sh
65+
```
66+
67+
The script takes a few minutes to run, but it should set up our environment with the packages we need. Now you can return to Step 4 of the [AWS Cloud: getting started](https://ecco-v4-python-tutorial.readthedocs.io/AWS_Cloud_getting_started.html) tutorial.

Cloud_Setup/jupyter_env_setup.sh

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
#!/bin/bash
2+
3+
# Shell script for setting up conda, jupyter, essential Python packages on an AWS EC2 instance.
4+
# Assumes that the ECCO-v4-Python-Tutorial Github repository has already been downloaded using:
5+
#
6+
# $ sudo dnf update -y
7+
# $ sudo dnf install git -y
8+
# $ cd ~
9+
# $ git clone https://github.com/ECCO-GROUP/ECCO-v4-Python-Tutorial.git
10+
11+
# Then run this script:
12+
#
13+
# $ sudo chmod 755 ~/ECCO-v4-Python-Tutorial/Cloud_Setup/jupyter_env_setup.sh
14+
# $ ~/ECCO-v4-Python-Tutorial/Cloud_Setup/jupyter_env_setup.sh
15+
16+
17+
18+
# # Start body of script
19+
20+
red_start='\033[0;31m'
21+
blue_start='\033[0;34m'
22+
nocolor_start='\033[0m'
23+
24+
# install wget
25+
sudo dnf install wget -y
26+
echo -e "${red_start}Installed wget${nocolor_start}"
27+
28+
# install tmux
29+
sudo dnf install tmux -y
30+
echo -e "${red_start}Installed tmux${nocolor_start}"
31+
32+
# retrieve and install miniforge in /tmp/
33+
# assuming EBS volume is already attached to instance
34+
echo -e "${red_start}Starting Miniforge3 installation${nocolor_start}"
35+
mkdir -p /tmp
36+
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh" -O /tmp/Miniforge3.sh
37+
bash /tmp/Miniforge3.sh -b -p /tmp/conda
38+
rm -f /tmp/Miniforge.sh
39+
source "/tmp/conda/etc/profile.d/conda.sh"
40+
source "/tmp/conda/etc/profile.d/mamba.sh"
41+
42+
echo -e "${red_start}Completed Miniforge3 installation${nocolor_start}"
43+
44+
# add conda and mamba to path
45+
mamba init
46+
47+
# set paths to environment and package directories
48+
printf '\n# set conda environment and package directories' >> ~/.bashrc
49+
printf '\nexport CONDA_ENVS_PATH=/tmp/conda/envs' >> ~/.bashrc
50+
printf '\nexport CONDA_PKGS_DIRS=/tmp/conda/pkgs' >> ~/.bashrc
51+
source ~/.bashrc
52+
53+
# create jupyter environment under /tmp/conda/envs/
54+
# (in EBS storage to save space in home directory)
55+
mamba create --name jupyter python=3.11 -y
56+
echo -e "${red_start}Created jupyter environment${nocolor_start}"
57+
58+
# install python packages (using mamba) in jupyter environment
59+
mamba activate jupyter
60+
echo -e "${red_start}Installing Python packages in jupyter environment${nocolor_start}"
61+
mamba install requests tqdm numpy pandas -y
62+
mamba install xorg-libice libexpat libevent -y
63+
mamba install nspr alsa-lib libogg libpq -y
64+
mamba install xorg-renderproto xorg-xf86vidmodeproto graphite2 expat -y
65+
mamba install libgpg-error dbus -y
66+
mamba install libflac gettext -y
67+
mamba install xcb-util-wm xorg-libx11 xcb-util-image -y
68+
mamba install xkeyboard-config -y
69+
mamba install libxkbcommon fonts-conda-forge font-ttf-ubuntu gstreamer zlib -y
70+
mamba install xorg-xextproto libpng attr mpg123 -y
71+
mamba install pixman libvorbis glib-tools -y
72+
mamba install libsystemd0 xcb-util-keysyms xorg-libxrender libllvm15 -y
73+
mamba install font-ttf-dejavu-sans-mono pcre2 font-ttf-inconsolata font-ttf-source-code-pro -y
74+
mamba install lame nss xorg-xproto pthread-stubs xorg-libxdmcp -y
75+
mamba install libgcrypt xorg-libsm xorg-libxext fonts-conda-ecosystem xorg-kbproto mysql-libs -y
76+
mamba install fontconfig libjpeg-turbo xcb-util-renderutil -y
77+
mamba install glib -y
78+
mamba install freetype libcap libcups libopus -y
79+
mamba install gst-plugins-base mysql-common xcb-util -y
80+
mamba install cairo -y
81+
mamba install libsndfile harfbuzz xorg-libxau -y
82+
mamba install libglib libxcb -y
83+
mamba install qt-main -y
84+
mamba install pyqt -y
85+
mamba install matplotlib -y
86+
mamba install netcdf4 -y
87+
mamba install h5netcdf -y
88+
mamba install boto3 lxml -y
89+
mamba install scipy -y
90+
mamba install geos -y
91+
mamba install proj pyproj -y
92+
mamba install cartopy -y
93+
mamba install notebook -y
94+
mamba install progressbar -y
95+
mamba install gsw -y
96+
mamba install nco -y
97+
98+
# install remaining packages using pip
99+
# (mamba installs tend to get killed on t2.micro)
100+
pip install dask
101+
pip install "xarray[complete]"
102+
pip install jupyterlab
103+
pip install dask_labextension
104+
pip install s3fs
105+
pip install ecco_v4_py
106+
107+
echo -e "${red_start}Completed Python package installations${nocolor_start}"
108+
109+
echo -e "${red_start}Setting up NASA Earthdata authentication${nocolor_start}"
110+
# NASA Earthdata authentication
111+
# check if credentials are already archived in ~/.netrc, and if not then prompt the user for them
112+
earthdata_cred_stored=0
113+
if [ -f ~/.netrc ]; then
114+
if grep -q "machine urs.earthdata.nasa.gov" ~/.netrc; then
115+
earthdata_cred_stored=1
116+
echo -e "${red_start}Earthdata credentials already archived"
117+
fi
118+
fi
119+
if [ $earthdata_cred_stored -eq 0 ]; then
120+
if [ -f ~/.netrc ]; then sudo chmod 600 ~/.netrc; fi
121+
read -p 'NASA Earthdata username: ' uservar
122+
read -sp 'NASA Earthdata password: ' passvar
123+
echo -e "machine urs.earthdata.nasa.gov\n login ${uservar}\n password ${passvar}\n" >> ~/.netrc
124+
125+
echo -e "\n${red_start}NASA Earthdata authentication info archived in ~/.netrc${nocolor_start}"
126+
fi
127+
sudo chmod 400 ~/.netrc
128+
129+
# create symlink to jupyter_lab_start.sh from the user's home directory
130+
ln -s ~/ECCO-v4-Python-Tutorial/Cloud_Setup/jupyter_lab_start.sh ~/jupyter_lab_start.sh

Cloud_Setup/jupyter_lab_start.sh

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
#!/bin/bash
2+
3+
# Shell script to start a Jupyter lab session
4+
# in a tmux window (so it persists even if ssh tunnel is disconnected)
5+
6+
red_start='\033[0;31m'
7+
blue_start='\033[0;34m'
8+
nocolor_start='\033[0m'
9+
10+
source /tmp/conda/bin/activate
11+
conda activate jupyter
12+
13+
# Start configuration for Jupyter lab
14+
echo "Enter password to access Jupyter lab from browser,"
15+
echo "or leave blank to not require a password."
16+
PW="$(python3 -c 'from jupyter_server.auth import passwd; import getpass; print(passwd(getpass.getpass(), algorithm="sha256"))')"
17+
jlab_start="jupyter Space lab Space --no-browser Space --autoreload Space --port=9889 Space --ip='127.0.0.1' Space --NotebookApp.token='' Space --NotebookApp.password=\"$PW\" Space --notebook-dir=\"~/ECCO-v4-Python-Tutorial\""
18+
19+
# Start new tmux session
20+
tmux new -d -s jupyterlab
21+
22+
# Execute commands in tmux window using send-keys
23+
tmux send-keys -t jupyterlab source Space /tmp/conda/bin/activate Enter
24+
tmux send-keys -t jupyterlab conda Space activate Space jupyter Enter
25+
tmux send-keys -t jupyterlab ${jlab_start} Enter
26+
27+
# Print info about tmux session
28+
echo -e "${red_start}Started Jupyter lab in tmux session jupyterlab"
29+
echo -e "${red_start}Access from your local machine in a browser window at"
30+
echo -e "${blue_start}http://127.0.0.1:9889/"
31+
echo -e "${red_start}tmux session can be accessed with"
32+
echo -e "${blue_start}tmux a -t jupyterlab"
33+
echo -e "${red_start}and detached from current window by pressing keys"
34+
echo -e "${blue_start}Ctrl-b d"
35+
echo -e "${red_start}and terminated with"
36+
echo -e "${blue_start}tmux kill-ses -t jupyterlab${nocolor_start}"

Cloud_Setup/sshd_enable.sh

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
#!/bin/bash
2+
3+
# Script to enable ssh connections,
4+
# on EC2 instances where they are disabled by default
5+
# (e.g., with JPL AMIs).
6+
#
7+
# This script must be run as root, otherwise an error is returned:
8+
# $ sudo ./sshd_enable.sh
9+
#
10+
# Once this script runs successfully,
11+
# it should be possible to login to the instance using ssh, e.g.:
12+
#
13+
# $ ssh -i "~/.ssh/key_pair.pem" jpluser@private_ip_address
14+
15+
16+
# Return error if not running as root
17+
if [ $( whoami ) != "root" ]; then
18+
echo "Error: this script must be run as root"
19+
echo "Please re-run using sudo, e.g.:"
20+
echo "$ sudo ./sshd_enable.sh"
21+
exit 1
22+
fi
23+
24+
25+
# Try to enable sshd
26+
systemctl enable sshd
27+
if [ $? -eq 0 ]; then
28+
echo "Enabled sshd successfully"
29+
else
30+
if [ $( readlink -f /etc/systemd/system/sshd.service) = "/dev/null" ]; then
31+
# Delete this /dev/null symlink
32+
rm -f /etc/systemd/system/sshd.service
33+
echo 'Deleted symlink to /dev/null'
34+
35+
# Re-try enabling sshd
36+
systemctl enable sshd
37+
if [ $? -eq 0 ]; then
38+
echo "Enabled sshd successfully"
39+
else
40+
echo "Error: symlink deletion did not allow sshd to be enabled"
41+
exit 1
42+
fi
43+
else
44+
echo "Error: sshd not enabled successfully"
45+
fi
46+
fi
47+
48+
# Create symlink to the service (if it does not already exist)
49+
if [ ! -f /usr/lib/systemd/system/sshd.service ]; then
50+
ln -s /etc/systemd/system/multi-user.target.wants/sshd.service /usr/lib/systemd/system/sshd.service
51+
echo "Created symlink to sshd.service"
52+
fi
53+
54+
# create new ssh keys
55+
ssh-keygen -q -N "" -t rsa -b 4096 -f /etc/ssh/ssh_host_rsa_key
56+
echo "Created new ssh keys"
57+
58+
# start sshd service
59+
systemctl start sshd
60+
echo "Started sshd"
61+
echo "Now you can login to your instance using ssh, e.g.:"
62+
echo '$ ssh -i "~/.ssh/your_key_pair.pem" jpluser@private_ip_address'
63+
64+
65+
# move git repo to ssh user's directory and change ownership (if requested)
66+
read -p 'Move ECCO-v4-Python-Tutorial repo to different user? (Y/[N]) ' move_opt
67+
if [ $move_opt == "Y"] || [ $move_opt == "y" ]; then
68+
read -p 'User name of new owner [jpluser for JPL]: ' ssh_user
69+
cd /home
70+
mv ./ssm-user/ECCO-v4-Python-Tutorial ./${ssh_user}/
71+
echo "Moved ECCO-v4-Python-Tutorial repo to /home/${ssh_user}/"
72+
chown -R ${ssh_user}:${ssh_user} ./${ssh_user}/ECCO-v4-Python-Tutorial
73+
echo "Changed owner and group of git repo to ${ssh_user}"
74+
fi

0 commit comments

Comments
 (0)