presentation

# How to improve job scripts for better resource usage

## (cores and memory)

## Radovan Bast ([@__radovan](https://twitter.com/__radovan))

---

## Goal

```bash
#!/bin/bash

#SBATCH --account=MyProject
#SBATCH --job-name=MyJob
#SBATCH --time=1-00:00:00
#SBATCH --mem-per-cpu=2G
#SBATCH --ntasks=16

set -o errexit        # exit the script on any error
set -o nounset        # treat any unset variables as an error
module --quiet purge  # clear any inherited modules

./myprogram < input.txt > output.txt
```

- This presentation is about the `--ntasks`, `--mem-per-cpu`, and `--time`.
- We will not talk about Slurm partitions.
- Many documentation pages (including [our documentation](https://documentation.sigma2.no))
  shows how to specify these .emph[how do we know what values to use?]
- Documentation expects me to know whether my job uses MPI or OpenMP but how can I tell?

---

## Motivation/ learning outcomes

- Be able to tell what resources (.emph[memory, cores, time]) your job needs
- Understand why knowing the resource needs can be .emph[good for you] and for all other users

There will be few slides and a demo in the terminal where I will try some of this out.

---

## How we imagine that job scripts are prepared

- Taking some training
- Reading documentation
- Careful calibration and profiling
- Growing and benchmarking the model to meaningful parameters
- Then running the actual computations
- ... while monitoring resource usage from time to time

---

## How job scripts are often prepared

- Job scripts are often passed from generation to generation
- Tweaking until it does not crash
- Then running the actual computations
- If it crashes, contact support and/or check documentation

### Discuss possible problems

---

## Why it matters

### Memory

- Asking for too little cuts the job
- Asking for too much can mean that you block idle CPUs and get charged for them
- Asking for too much can mean a lot longer queuing

### Cores

- Asking for too few can lead to underused nodes or longer run time
- Asking for too too many can mean wasted CPU resources
- Asking for too much can mean a lot longer queuing

### Time

- Asking for too little cuts the job
- Asking for too much can mean a lot longer queuing and temporarily negative funds

---

## Few points which are often misunderstood

### None of this is expected to be obvious for a beginner

- We request resources from the scheduler (queuing system).
- But the scheduler cannot tell how long the job will run and what resources it
  will really consume.
- Just the fact that I am asking the scheduler for 40 cores
  .emph[does not mean that the code will actually run in parallel] and use all of them.
- .emph[Number of cores and amount of memory are not independent]. If you ask for more memory
  than is available on the number of cores, you will reserve and block more cores.
- Asking for a lot of memory "just to be on the safe side" can affect your queuing time
  and compute budget.

---

# First time on a new machine?

## Please do not start immediately with a 16-node and 40-hour calculation

---

## How to grow your calculation

- Start with a short 5-minute run on 1 core
- Then try more cores on the same node
- Then try going beyond one node
- Then increase the system size and make your calculation longer

### Discussion

- What are the advantages of this approach?
- What are possible challenges of this approach?

---

## How to grow your calculation

- To calibrate the computation parameters the scientific results do not have to be meaningful
- This can still be meaningful for the calibration
- Don't be afraid to create fantasy datasets to test the scaling

---

## Create a small example for you

- Something that runs in 5 minutes
- Where you know the result and the timing

### Next time you are unsure whether it's the machine or "you"?

- Run the small example
- If it suddenly stopped working or runs slower, you know it's probably the machine
- Now you can send us the example so that we fix the machine

---

# Demo time

---

## Demo (1/2): How much memory do I need?

- Several strategies are outlined in
  [How to choose the right amount of memory](https://documentation.sigma2.no/jobs/choosing-memory-settings.html)
- We will together try some of this
- If this is your own code and needs excessive memory perhaps contact us for
  [extended support](https://documentation.sigma2.no/getting_help/extended_support.html)?
- Also consider which queue/partition you submit to

---

## Demo (2/2): How many cores should we ask for?

- Several strategies are outlined in
  [How to choose the numer of cores](https://documentation.sigma2.no/jobs/choosing-number-of-cores.html)
- We will together try some of this
- Do not hesitate to contact
  [support](https://documentation.sigma2.no/getting_help/support_line.html) if
  you are unsure about how many cores/tasks/threads to ask for