Metadata-Version: 2.4
Name: sarray
Version: 0.3.1
Summary: Meta scheduler merging Slurm Arrays
Author-email: Nathan Cassereau <nathan.cassereau@polytechnique.edu>
License: Apache-2.0
Project-URL: Source, https://github.com/ncassereau/sarray
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Environment :: Console
Classifier: Topic :: System :: Distributed Computing
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: jinja2>=3.1.6
Requires-Dist: rich>=14.3.3
Requires-Dist: simple-parsing>=0.1.8
Dynamic: license-file

# sarray

Merge multiple independent Slurm job arrays into a single `sbatch` submission.

Instead of flooding the scheduler with N separate job arrays, `sarray` combines them into one array job where each task is routed to the right script with the right arguments. This reduces scheduler overhead and makes queue management easier.

## How it works

Given two scripts:

```
# train.slurm  →  --array=0-2  (3 tasks)
# eval.slurm   →  --array=0-4  (5 tasks)
```

`sarray` generates a single array job with `--array=0-7` (8 tasks) and a dispatcher that maps each global task ID back to the right script and local task ID:

```
global 0,1,2     → train.slurm  with SLURM_ARRAY_TASK_ID = 0,1,2
global 3,4,5,6,7 → eval.slurm   with SLURM_ARRAY_TASK_ID = 0,1,2,3,4
```

All `SLURM_ARRAY_TASK_*` environment variables are set correctly in each task.

**Constraint:** all merged jobs must have identical `#SBATCH` options (same resources, partition, etc.) — only `--array` can differ.

---

## Installation

```bash
pip install sarray
# or
uv add sarray
```

---

## Usage

### Interactive mode (recommended)

Start a listen session — this spawns a subshell where `sbatch` is intercepted:

```bash
sarray listen
```

Your prompt changes to **`[sarray]`** (bold yellow) to indicate you're in a session.

Inside the session, call `sbatch` normally. Every call is queued instead of submitted:

```bash
sbatch --array=0-4 train.slurm model_a
sbatch --array=0-4 train.slurm model_b
sbatch eval.slurm
```

When ready, submit everything as one merged array:

```bash
sarray submit
```

Or discard and exit without submitting:

```bash
sarray cancel
```

Both commands exit the subshell automatically.

---

### Standalone mode

Pass a queue file directly — no subshell needed:

```bash
sarray submit jobs.conf
```

Where `jobs.conf` contains one `sbatch` call per line:

```
sbatch --array=0-2 train.slurm lr=0.01
sbatch --array=0-2 train.slurm lr=0.001
sbatch --array=0-2 train.slurm lr=0.0001
```

Read from stdin:

```bash
echo "sbatch job.slurm" | sarray submit -
```

---

## Commands

### `sarray listen`

Spawns an interactive subshell with a fake `sbatch` that queues calls into a temporary file. The real `sbatch` is shadowed only inside this subshell — your parent shell is unaffected.

Exit the session with `sarray submit` or `sarray cancel`.

---

### `sarray submit [FILE|-]`

Generate and submit the merged job array.

| Argument / Flag | Description |
|---|---|
| `FILE` | Queue file to read (one `sbatch ...` line per job). Omit to use the active listen session. |
| `-` | Read queue from stdin. |
| `-o`, `--output FILE` | Save the generated script to this file (default: `sarray.slurm` in the current directory). |
| `-n`, `--dry-run` | Print the generated script to stdout (syntax-highlighted) without submitting. |
| `-t`, `--throttle N` | Limit the number of simultaneously running tasks (`%N` appended to `--array`). |
| any `sbatch` flag | Any unknown flag is treated as an sbatch option and applied to the merged script (see below). |

The generated script is always written to disk before submission — `sarray.slurm` by default — so you can always inspect what was submitted.

**Submit-time sbatch overrides.** Any flag not recognized by `sarray submit` is passed through as an `#SBATCH` directive in the generated script, overriding whatever the individual jobs had. Useful for options you can't know ahead of time:

```bash
# Chain two job arrays
sarray submit --dependency=aftercorr:12345

# Use a reservation you just got
sarray submit --reservation=my_nodes

# Delay start
sarray submit --begin=2026-04-04T08:00:00
```

**CLI flags override `#SBATCH` directives.** For example:

```bash
sbatch --mem=8GB job.slurm    # overrides #SBATCH --mem in job.slurm
sbatch --array=0-9 job.slurm  # overrides #SBATCH --array in job.slurm
```

`--wrap` is also supported (no script file needed):

```bash
sbatch --wrap "python train.py" --array=0-4 --mem=16GB
```

---

### `sarray cancel`

Discard the current listen session queue and exit the subshell.

---

### `sarray throttle JOBID -n N [--requeue] [--kill]`

Update the concurrent task limit of a running job array without cancelling it.

| Argument / Flag | Description |
|---|---|
| `JOBID` | ID of the running job array. |
| `-n`, `--max`, `--max-tasks N` | New maximum number of simultaneously running tasks. |
| `-r`, `--requeue` | Requeue the most recently started tasks running above the new limit (they will run again later). |
| `-k`, `--kill` | Cancel (via `scancel`) the most recently started tasks running above the new limit. |

Excess tasks are always selected by recency — the most recently started ones are acted on first. `--requeue` and `--kill` are mutually exclusive.

```bash
# Slow down a running array to 2 concurrent tasks
sarray throttle 123456 --max 2

# Slow down and requeue the excess running tasks (they will be rescheduled)
sarray throttle 123456 --max 2 --requeue

# Slow down and permanently cancel the excess running tasks
sarray throttle 123456 --max 2 --kill
```

The command checks that the job exists, belongs to you, and is a job array before updating.

---

## Example workflow

```bash
$ sarray listen
[sarray] $ sbatch --array=0-9 experiments/baseline.slurm
[sarray] $ sbatch --array=0-9 experiments/ablation.slurm
[sarray] $ sbatch --array=0-9 experiments/ablation2.slurm
[sarray] $ sarray submit --dry-run   # preview the merged script
[sarray] $ sarray submit             # submit and exit the session
Submitted batch job 42137
$
```

Result: one job array with 30 tasks instead of 3 separate submissions.
