Amarel: Difference between revisions
| Line 103: | Line 103: | ||
| ==Running on the Amarel Compute Nodes== | ==Running on the Amarel Compute Nodes== | ||
| When you '''ssh''' to ''Amarel'' you are connected to one of the login nodes. These nodes are to be used for file editing, transfers output and input files to/from your local computer, and modest code compiling tasks and analysis, but not for compute intensive tasks. Here we explain how to connect to the compute nodes for these larger tasks. | When you '''ssh''' to ''Amarel'' you are connected to one of the login nodes. These nodes are to be used for file editing, transfers output and input files to/from your local computer, and modest code compiling tasks and analysis, but not for compute intensive tasks. Here we explain how to connect to the compute nodes for these larger tasks. | ||
| Note: Consult the Amarel status page (https://oarc.rutgers.edu/amarel-system-status/) before scheduling a job. Amarel is down for maintenance monthly.  | |||
| ===Using the SLURM batch job scheduler=== | ===Using the SLURM batch job scheduler=== | ||
Revision as of 21:40, 11 January 2022
Getting Started
This wiki page is a brief "Getting Started" introduction to running the ROMS ocean model, and analyzing model output, on the Rutgers Office of Advanced Research Computing (OARC) cluster computer Amarel. There is a comprehensive user guide at https://sites.google.com/view/cluster-user-guide to which you can refer for more detailed information.
Rutgers VPN
If you are not on campus, you will need to connect to the Rutgers VPN. To use Rutgers VPN, you must first enroll in Duo 2 Factor Authentication (2FA) (NetID+). Most users have probably already done this. If not, navigate to https://netid.rutgers.edu/setupTwoFactorAuthentication.htm and follow the instructions.
Once you are enrolled in NetID+ you will need to activate the VPN service, if you have not already. Navigate to https://soc.rutgers.edu/vpn/ and click the gray button titled Service Activation. If you are not already logged in you may be asked to login and/or approve your login with the Duo Mobile app. To activate the VPN service, click the checkbox next to Remote Access VPN, Cisco AnyConnect Access for Rutgers and click the Activate Services button.
You are now ready to download the Cisco VPN client and connect to the Rutgers VPN. Complete instructions can be found here by clicking the red button titled General VPN Instructions to download the PDF. The instructions are geared towards Windows so if you are using a Mac you might find this page more helpful. In most cases, regardless of your operating system, pointing your browser to https://vpn.rutgers.edu/ and logging in with your NetID will lead you to downloading the correct VPN client.
Once installed, open the Cisco AnyConnect client and type vpn.rutgers.edu in the box and click Connect:
In the next window, the Username will be your NetID, the password field for either will be your NetID Password and for 2FA, you have 4 options to enter in the Second Password/Duo Action field:
- Enter a 6 digit Duo Passcode. These are generated either by a Hard Token, showing the passcode in the Duo Mobile App, or from a previous “SMS” request. Simply type in the 6 numbers and hit OK.
- Type the word “push”. This will send a push notification to the primary device you have enrolled with Duo through NetID+ with the option to Accept or Deny.
- Type the word “phone”. You will receive a phone call to the primary device you have enrolled with Duo through NetID+ with touch tone options to Accept or Deny.
- Type the word “sms”. You will receive a text message to the primary device you have enrolled with Duo through NetID+ containing passcodes you can use to logon.
Click OK and you should be connected to Rutgers VPN. You should have a small AnyConnect icon with a lock on it in the task tray (Windows  ) or in the menu bar (Mac OS
) or in the menu bar (Mac OS  ).
).
Connecting to Amarel with SSH
In order to connect to Amarel and compile and run ROMS you will need an SSH/terminal client. If you already use an SSH/terminal application you are comfortable with then stick with that and adapt the instructions below accordingly. For these instructions to work you need to either be on campus or connected to the Rutgers VPN.
Mac OS
If you are on a Mac, open Terminal (found in the Applications -> Utilities folder) or iTerm2 and type:
- ssh fakeuser@amarel.rutgers.edu
replacing fakeuser with your NetID. When it asks if you want continue connecting enter yes, then enter your NetID password.
Having to type the above command and netid password every time you login is moderately tedious, but this username/password authentication can become very annoying and time consuming when repeatedly using the scp command to copy files to Amarel. Passwordless access and file transfer can be enabled using SSH keys by following the instructions below.
- If you are still connected to Amarel, disconnect by entering exit
- At the prompt for your local machine, enter nano ~/.ssh/config and paste the block below at the end of the file:Host amarelreplacing [NetID] with your own NetID. Save and exit (Ctrl+o Ctrl+x). This allows you to type ssh amarel to connect.
 Hostname amarel.rutgers.edu
 HostKeyAlias amarel
 User [NetID]
- Enter ls ~/.ssh. If there is a file listed called id_rsa.pub skip to 5
- Enter ssh-keygen -t rsa, then hit return to accept the default location, enter a passphase twice (or leave blank for no passphrase)
- Copy the public portion of the key to Amarel by entering scp ~/.ssh/id_rsa.pub amarel:.
- SSH to Amarel (ssh amarel) and check if you already have a .ssh folder:[rjdave@amarel1 ~]$ ls -l .sshYou probably won't have all the files above but if you instead get the message ls: cannot access .ssh: No such file or directory, enter the following command: mkdir .ssh && chmod 700 .ssh. This will create the .ssh folder and set the required permissions.
 total 33
 -rw------- 1 rjdave rjdave 1069 Sep 8 2017 authorized_keys
 -rw-r--r-- 1 rjdave rjdave 434 Jun 4 2021 config
 -rw-r--r-- 1 rjdave rjdave 5992 Dec 2 13:47 known_hosts
- Once you have a .ssh folder with the proper permissions, enter the following cat ~/id_rsa.pub >> ~/.ssh/authorized_keys && chmod 640 ~/.ssh/authorized_keys then exit back to your local machine and execute ssh amarel again. It should no longer ask for your password.
Windows
For Windows, we recommend MobaXterm installer edition.
- Once installed open it, choose light or dark theme and click the Session button in the upper left.
- Choose SSH and enter amarel.rutgers.edu for Remote host, check the Specify username box and enter your NetID in the box and click OK.
- You will be asked to type your NetID password and then asked if you want to save the password. If you choose yes you will be asked to set a master password to encrypt all your saved passwords.
- In the future you can click Sessions (not Session) and select amarel.rutgers.edu and not have to enter your password.
Setting up your .bashrc
The default software setup by OARC on Amarel does not include everything needed to compile and run ROMS. Ocean Modeling Group computing specialist David Robertson has configured, and maintains and updates, a repository of what you need for ROMS and other models that will be automatically loaded for every login session once you add a couple lines to the login script (.bashrc, .cshrc, etc.) in your home directory. Log in to Amarel with Terminal/iTerm2 or MobaXterm and edit your login script as described below.
Unless otherwise requested your default shell will be bash and the following three lines (shown in red) should be added (with nano or your preferred editor) near the top of your .bashrc after the import of the global definitions as shown:
- # Source global definitions
 if [ -f /etc/bashrc ]; then
 . /etc/bashrc
 fi
 ulimit -s unlimited
 export MODULEPATH=/projects/dmcs_1/sw/modulefiles/Core:${MODULEPATH}
 export SQUEUE_FORMAT=“%.18i %.9P %.8j %.8u %.2t %.10M %.10l %.4C %.6D %R”
The first line allows you to take full advantage of computing resources (may be obsolete now but it won't hurt anything). The first export line will tell Lmod where to find the modules that will load the custom software. The second export line makes the squeue command (explained later) more useful. Once you have added these lines, log out and back in for them to take effect.
Checking Out the ROMS Source Code
Register as a ROMS user
If you are not already a ROMS user, you will need to fill out the registration form (found here) and wait for approval. Once approved, you will be able to checkout the ROMS source code. There is a git repository available but Subversion (SVN) is our recommended way to obtain the ROMS source code. If you prefer git, the initial process is a little more involved but you can follow the directions found on the WikiROMS git page.
Checking out the ROMS source code
There is nothing special about checking out the ROMS source code on Amarel. The same svn commands you’re used to will work on Amarel. However, the first time you check the code out you will need to use the ‘--username’ flag unless your NetID matches you ROMS username. Navigate to the directory where you want your ROMS source code to reside and execute this svn checkout command:
- svn --username <user> co https://www.myroms.org/svn/src/trunk <my_src_dir>
replacing <user> with your ROMS username and <my_src_dir> with what you want the source code directory to be called. After typing your password it will ask you if you want to store your password. We recommend answering yes but it’s up to you. If you answer no you will have to type your ROMS password every time you do an svn checkout or svn update. After your first checkout you will no longer need the --username flag for svn operations to any of the myroms.org subversion repositories.
Loading and Unloading Modules
Like many clusters, Amarel uses environment modules to load and unload software and configure the environment. Some commands you will find useful are:
- module help Display help message
 module help <m1> Show help information for module <m1>
 module available Show all modules currently available
 module whatis <m1> Show brief information about module <m1>
 module spider List all possible modules even if not currently available
 module list List currently loaded modules
 module load Load the specified module(s)
 module unload Unload the specified module(s)
 module swap <m1> <m2> Unload <m1> and then load <m2> (for switching versions of the same software)
 module purge Unload all loaded modules
Loading the ROMS Module
Setting up your environment to compile and run ROMS is as simple as loading the roms module.
- module load roms
 module list
 Currently Loaded Modules:
 1) intel/17.0.4 2) mvapich2/2.2 3) mct-roms/2.6.0 4) netcdf/4.6.2
 5) esmf/8.0.0_nc4 6) parpack-roms/2.1 7) hdf5/1.10.4 8) roms/intel_nc4
Notice that loading the roms module will actually load 7 other modules.
This will setup your environment to use the Intel compiler so remember to set FORT to ifort in your build script.
Configuring and Compiling ROMS
Using the build script is the recommended method for compiling ROMS on Amarel. Some of the modules that load with the roms module also set environment variables that help simplify your roms build script. Starting from the latest build_roms.sh you will likely only need to change ROMS_APPLICATION and MY_ROMS_SRC and make sure FORT is set to ifort.
 It is important to make sure USE_MY_LIBS is set to no or your compilation will fail.
 It is important to make sure USE_MY_LIBS is set to no or your compilation will fail.
Remembering the name you gave above for <my_src_dir>, you will find latest build_roms.sh script in subdirectory ROMS/Bin. Copy that script to the directory you will work in to run ROMS.
Configure build_roms.sh by setting the line MY_ROMS_SRC to point to your choice for <my_src_dir>.
If this is your first time working with ROMS, a good starting place is to compile the default UPWELLING test case that is indicated by the build_roms.sh setting ROMS_APPLICATION=UPWELLING.
Copy to your working directory the file upwelling.h from subdirectory ROMS/Include of your source code.
Once your build script is configured and you have upwelling.h in your working directory you can compile ROMS by typing:
- ./build_roms.sh -j 4
where the number after -j indicates the number of compute cores to use in parallel to execute the compilation. The greater the number, the faster it goes.
However, the login node you will be compiling on is shared for the entire Amarel system. If you use a number larger than 4, or omit it altogether (which says use all cores on the login node) your build might be terminated by an administrator. Be a considerate user and keep the number low.
If compilation was successful, there will be a file named romsM that is the model executable.
Running on the Amarel Compute Nodes
When you ssh to Amarel you are connected to one of the login nodes. These nodes are to be used for file editing, transfers output and input files to/from your local computer, and modest code compiling tasks and analysis, but not for compute intensive tasks. Here we explain how to connect to the compute nodes for these larger tasks.
Note: Consult the Amarel status page (https://oarc.rutgers.edu/amarel-system-status/) before scheduling a job. Amarel is down for maintenance monthly.
Using the SLURM batch job scheduler
Amarel uses SLURM workload manager to schedule compute intensive tasks. The user configures a SLURM job script for each model run and submits this script with the sbatch command. The job script declares the resources required, such as number of CPUs for parallel jobs, maximum memory required, etc.
We have configured a simple template job script (for the ROMS UPWELLING example) that you can copy from /projects/dmcs_1/courses/job.sh into the directory that you will work from. The contents are shown below:
- #!/bin/bash
 #SBATCH --partition=dmcs_1 # Partition (job queue)
 #SBATCH --job-name=upwelling # Assign a short name to your job
 #SBATCH --nodes=1 # Number of nodes you require
 #SBATCH --ntasks=4 # Total number of tasks you'll launch
 #SBATCH --ntasks-per-node=4 # Number of tasks you'll launch on each node
 #SBATCH --cpus-per-task=1 # Cores per task (>1 if multithread tasks)
 #SBATCH --mem=6400 # Real memory (RAM) required (MB) per node
 #SBATCH --time=00-00:05 # Total run time limit (DD-HH:MM)
 #SBATCH --output=out.%N.%j # STDOUT output file
 #SBATCH --error=err.%N.%j # STDERR output file (optional but recommended)
 #SBATCH --export=ALL # Export you current env to the job env
 ## It is important to have --mpi=pmi2 here or ROMS will not run
 srun --mpi=pmi2 ./romsM roms_upwelling.in
Running the ROMS Upwelling Example
You may have noticed that the srun command above includes a file named roms_upwelling.in. You will need to copy this file and varinfo.dat from the ROMS/External directory of your <my_src_dir> ROMS source code to the directory where you compiled ROMS.
After you copy the files, you will need to make a couple of small edits to roms_upwelling.in to get this to work.
- Change the line with VARNAME to read VARNAME = varinfo.dat (i.e. delete the ROMS/External part)
- Set NtileI == 2 and NtileJ == 2 to 2.
The product of NtileI and NtileJ, i.e. 2 x 2 = 4, is the number of cores the model will run on in parallel. This number must match the number in the SLURM job script options --ntasks and --ntasks-per-node
Now you can submit your job to sbatch with the command sbatch job.sh
Checking Job Status with squeue
Detailed documentation for monitoring your SLURM jobs can be found here. The easiest way to check whether your job is running is with the command squeue -p dmcs_1.
You should see output something like this:
- JOBID PARTITION NAME USER ST TIME TIME_LIMIT CPUS NODES NODELIST(REASON)
 17197219 dmcs_1 watl_psas rjdave R 3-04:07:00 3-20:00:00 4 1 hal0035
 17146609 dmcs_1 upwelling rjdave R 00:01:10 00:05:00 4 1 hal0035
You can see that the job is running (the ‘R’ under ‘ST’), has been running for 1 minute, 10 seconds, and is running on node 35.
Checking Progress with tail
You can check ROMS progress by using the tail command on the output file. For the upwelling job above the file would be called out.hal0035.17146609 so the command tail out.hal0035.17146609 will show you the most recent 10 lines written to the output log and will most likely tell you what time-step the model is on. Using the -f option (tail -f out.hal0035.17146609) will output appended data as the file grows. Ctrl-C will escape the display.
Cancelling a Job
To safest way cancel a queued or running job is to use its jobid. A job can be canceled by name but that is not recommended. To cancel the upwelling job in the example above you would issue the command scancel 17146609. If the job has not yet started, it will be removed from the queue. If it is running, all child processes will be killed and the job will be removed from the queue. You are only able cancel jobs that you own.
Running an interactive session on a compute node
It is possible to conduct your work interactively on one of the compute nodes (instead of the login node). For most work we will be doing in class this is not necessary, but if for some reason you have a job that needs many processors or a large amount of memory, and you want to run it interactively - say, to simply check that everything is in order for it to start correctly - there are instructions in the Cluster User Guide here.
For compute intensive interactive work, such as model analysis using Python or MATLAB, we recommend using the OnDemand interface to launch an interactive session on a set of compute nodes.
Using OnDemand to launch a Personal Jupyter Notebook
To plot model output you can use Matlab or Python through the Rutgers OnDemand system. Navigate your browser to https://ondemand.hpc.rutgers.edu/pun/sys/dashboard and login with your NetID and password. At the top of the page click My Interactive Sessions.
For Matlab:
- Select the MATLAB option in the left column choose your time, cores, memory, MATLAB version, and enter dmcs_1 in partition.
- Click Launch and wait. This can take a couple minutes.
- Once the Launch noVNC in New Tab button appears, click it and a MatLab GUI will open.
For Python:
- Select Personal Jupyter and choose time, cores, memory, and enter dmcs_1 in partition. Leave Reservation and Slurm feature blank, enter /projects/dmcs_1/miniconda3 for conda path, and /projects/dmcs_1/sw/packages/xroms/py38 for conda environment.
- Click Launch and wait. This can take a couple minutes.
- Once the Connect to Jupyter, Anaconda version 5.1.0 button appears, click it and wait again.
- Near the top-right click New -> Python 3 (ipykernel)
- Enter import xroms then hold down the shift key and hit the return/enter key. You will see an asterisk [*] in the square brackets.
- Once that asterisk changes to a 1, the xroms python module has been loaded.


