Metadata-Version: 2.4
Name: euler-files
Version: 1.3.0
Summary: Manage env-var caches on HPC clusters by syncing from slow persistent storage to fast scratch
License-Expression: MIT
Keywords: hpc,euler,cache,rsync,slurm
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: System :: Systems Administration
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: click>=8.0
Requires-Dist: jsonschema>=4.0
Requires-Dist: rich>=12.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: pytest-mock; extra == "dev"
Requires-Dist: ruff; extra == "dev"

# euler-files

Manage environment-variable caches on HPC clusters by syncing from slow
persistent storage to fast scratch directories.

On clusters like ETH Euler, persistent home directories live on slow
network-attached storage while per-job `$SCRATCH` sits on fast local or
parallel filesystems. Tools like HuggingFace, PyTorch Hub, and pip all write
multi-gigabyte caches under `$HOME/.cache/...` by default. Every job that reads
these caches pays the penalty of slow metadata ops on the shared filesystem.

**euler-files** fixes this by:

1. Syncing cache directories to `$SCRATCH` with rsync (parallel, locked,
   smart-skipped).
2. Exporting the relevant environment variables so tools read from scratch.
3. Optionally packaging Python venvs into Apptainer/Singularity `.sif` images
   for reproducible, fast container execution.

## Installation

```bash
pip install .
# or in development mode:
pip install -e '.[dev]'
```

Requires Python 3.8+ and the following system tools:

- `rsync` (available on virtually all HPC systems)
- `tar` (for Apptainer builds)
- `apptainer` (only for container features; usually loaded via `module load apptainer`)

## Quick start

```bash
# 1. Run the interactive setup wizard
euler-files init

# 2. Sync caches to scratch and export env vars (use eval!)
eval "$(euler-files sync)"

# 3. Check sync status
euler-files status
```

For daily use, set up the shell helper so you only have to type `ef`:

```bash
# Add to ~/.bashrc or ~/.zshrc:
eval "$(euler-files shell-init)"

# Then in any session:
ef              # syncs + exports
ef status       # shows status table
ef push         # reverse-syncs scratch back to persistent storage
```

## How it works

### The sync cycle

```
  Persistent storage          $SCRATCH
  (slow home/project)         (fast local/parallel FS)
  ┌──────────────┐            ┌──────────────┐
  │ ~/.cache/    │  ──rsync─> │ $SCRATCH/     │
  │   huggingface│            │  .cache/      │
  │   torch      │            │   euler-files/│
  │   pip        │            │     HF_HOME/  │
  └──────────────┘            │     TORCH_HOME│
                              └──────────────┘
                                     │
                              export HF_HOME=$SCRATCH/...
                              export TORCH_HOME=$SCRATCH/...
```

`euler-files sync` runs rsync for each configured variable in parallel (up to
`parallel_jobs` threads), then prints `export VAR=<scratch-path>` lines to
stdout. Wrapping it in `eval "$(...)"` applies those exports to the current
shell.

All progress messages go to stderr so they never pollute the `eval`-captured
output.

### Smart-skip markers

After a successful sync, a JSON marker file (`.VAR_NAME.synced`) is written
to the scratch cache root. On the next sync, rsync is skipped if:

1. The marker is younger than `skip_if_fresh_seconds` (default 3600 = 1 hour).
2. The source directory's top-level mtime hasn't increased (checks the
   directory itself and all immediate children).

Deep changes inside the tree are not detected by the marker — but rsync itself
handles those efficiently via its own delta-transfer algorithm. The marker only
avoids the overhead of *launching* rsync when nothing has changed.

Use `--force` to bypass smart-skip.

### Per-variable locking

Each variable gets its own lock file (`.VAR_NAME.lock`) using `flock`. If two
jobs try to sync the same variable simultaneously, one waits (up to
`lock_timeout_seconds`). This prevents partial or corrupted syncs on shared
scratch directories.

## Commands

### `euler-files init`

Interactive wizard that:

- Detects `$SCRATCH` automatically
- Presents preset env vars (HF_HOME, TORCH_HOME, PIP_CACHE_DIR, etc.) with
  descriptions
- Lets you pick which ones to manage and configure their source paths
- Warns about overlapping paths (e.g. `XDG_CACHE_HOME` contains `HF_HOME`)
- Offers advanced settings: parallel jobs, lock timeout, smart-skip threshold
- Saves config to `~/.euler-files.json`

### `euler-files sync [OPTIONS]`

Syncs caches from persistent storage to scratch.

| Option      | Description                                    |
|-------------|------------------------------------------------|
| `--dry-run` | Show what would be synced without doing it     |
| `--force`   | Ignore smart-skip markers and force rsync      |
| `--var VAR` | Sync only specific variable(s); repeatable     |
| `--verbose` | Show rsync details on stderr                   |

**Usage in scripts/jobs:**

```bash
eval "$(euler-files sync)"
```

**Usage in SLURM job scripts:**

```bash
#!/bin/bash
#SBATCH --job-name=training
#SBATCH ...

eval "$(euler-files sync)"

python train.py
```

### `euler-files status`

Displays a rich table showing:

- Source and scratch sizes (via `du -sh`)
- Last sync time and age
- Status: `fresh` (green), `stale` (yellow), `not synced` (red), or
  `source missing` (yellow)
- Total scratch usage

### `euler-files push [OPTIONS]`

Reverse sync: copies modified files from scratch back to persistent storage.
Useful after a job downloads new models or data to scratch.

| Option      | Description                              |
|-------------|------------------------------------------|
| `--var VAR` | Push only specific variable(s)           |
| `--dry-run` | Show what would be pushed                |

Also updates sync markers so subsequent syncs know the data is fresh.

### `euler-files migrate [WHAT] [OPTIONS]`

Moves a cache directory (or apptainer directory) to a new location. This is
useful when reorganizing storage or moving between project directories.

| Option        | Description                                  |
|---------------|----------------------------------------------|
| `WHAT`        | Variable name or field (`venv_base`, `sif_store`); interactive if omitted |
| `--to PATH`   | New location; prompted if omitted            |
| `--dry-run`   | Show what would be done                      |
| `--no-delete` | Keep old directory after migration           |
| `--yes`       | Skip confirmation prompt                     |

Migration steps:

1. rsync with `--delete` to the new location
2. Fix venv internal paths if migrating `venv_base` (rewrites shebangs and
   activate scripts)
3. Update config
4. Record migration in config history
5. Optionally delete old directory

### `euler-files shell-init [--shell bash|zsh|fish]`

Outputs a shell function `ef` for convenient use. Add to your shell rc file:

```bash
# Bash/Zsh:
eval "$(euler-files shell-init)"

# Fish:
eval (euler-files shell-init --shell fish)
```

The `ef` function:

- `ef` or `ef sync` — runs sync with eval (exports env vars into current shell)
- `ef status` — shows status
- `ef push` — reverse sync
- Any other subcommand is passed through to `euler-files`

### `euler-files venv install [ENV_NAME] [PACKAGES...]`

Creates or reuses a named uv environment under the configured venv base, then
installs packages into it.

| Option          | Description                                           |
|-----------------|-------------------------------------------------------|
| `ENV_NAME`      | Name of the environment directory under `venv_base`   |
| `PACKAGES...`   | Package requirements to install                       |
| `--python PY`   | Python version/interpreter to pass to `uv venv`       |
| `--venv-base`   | Override `apptainer.venv_base` / `$VENV_DIR`          |
| `--uv-cache-dir`| Override `uv_cache_dir` / `$UV_CACHE_DIR`             |
| `--index-url`   | Primary package index URL                             |
| `--extra-index-url` | Additional package index URL(s)                 |
| `--dry-run`     | Show the uv commands without running them             |

If a requirement includes a PyTorch CUDA wheel suffix such as
`torch==2.4.0+cu121`, euler-files automatically adds the matching PyTorch wheel
index, for example:

```bash
euler-files venv install my-ml-env torch==2.4.0+cu121 torchvision==0.19.0+cu121
# internally adds:
#   --extra-index-url https://download.pytorch.org/whl/cu121
```

### `euler-files venv migrate [SOURCE] [TARGET]`

Rebuilds one existing venv at a new target location by:

1. freezing the source environment with `uv pip freeze`
2. creating a fresh target venv with the same detected Python version
3. reinstalling the frozen requirements into the target
4. optionally verifying that source and target freeze outputs match

| Option              | Description                                         |
|---------------------|-----------------------------------------------------|
| `SOURCE`            | Existing venv path                                  |
| `TARGET`            | New venv path                                       |
| `--python PY`       | Override the Python version/interpreter             |
| `--uv-cache-dir`    | UV cache directory to use during rebuild            |
| `--index-url`       | Primary package index URL                           |
| `--extra-index-url` | Additional package index URL(s)                     |
| `--dry-run`         | Show commands without rebuilding                    |
| `--verify/--no-verify` | Enable/disable freeze comparison after rebuild   |
| `--overwrite-target`| Replace an existing target venv                     |
| `--delete-source`   | Delete the source venv after successful verification|
| `--allow-copy`      | Allow cache and target on different filesystems     |

### `euler-files venv migrate-store`

Migrates `UV_CACHE_DIR` together with all venvs under a venv base. This is the
bulk "move my uv installation store" workflow:

1. copy the old uv cache to the new location
2. rebuild each venv into the new venv base while pointing uv at the new cache
3. optionally update config and emit new `VENV_DIR` / `UV_CACHE_DIR` exports

| Option                | Description                                        |
|-----------------------|----------------------------------------------------|
| `--old-venv-base`     | Current venv base directory                        |
| `--new-venv-base`     | New venv base directory                            |
| `--old-uv-cache-dir`  | Current uv cache directory                         |
| `--new-uv-cache-dir`  | New uv cache directory                             |
| `--env NAME`          | Only migrate selected venv(s)                      |
| `--exclude-env NAME`  | Skip selected venv(s)                              |
| `--index-url`         | Primary package index URL                          |
| `--extra-index-url`   | Additional package index URL(s)                    |
| `--verify/--no-verify`| Enable/disable per-venv verification               |
| `--continue-on-error` | Continue with later venvs after a failure          |
| `--overwrite-targets` | Replace existing target venv directories           |
| `--delete-old`        | Delete old cache and venv base after success       |
| `--update-config`     | Update stored config paths after success           |
| `--allow-copy`        | Allow cache and target on different filesystems    |
| `--skip-cache-copy`   | Skip copying the old uv cache contents             |

## JSON mode

Most operational commands also support:

- `--input-json <path>` (or `-` for stdin) to read arguments/config from JSON
- `--json` to emit machine-readable JSON instead of shell/text output

This makes the CLI usable in pipelines without scraping terminal text.

### `euler-files schema`

Emits machine-readable schemas for:

- the Click command surface (arguments, options, types, defaults)
- the supported `--input-json` payloads

Examples:

```bash
# Full schema bundle
euler-files schema

# Only the Click invocation schema for one command
euler-files schema --kind cli --command apptainer.build

# Only the --input-json schema for one command
euler-files schema --kind input --command venv.install

# Schema for the new venv migration payload
euler-files schema --kind input --command venv.migrate-store
```

### Examples

```bash
# Save base config non-interactively
cat <<'JSON' | euler-files init --input-json - --json
{
  "scratch_base": "$SCRATCH",
  "uv_cache_dir": "$SCRATCH/.cache/uv",
  "vars": {
    "HF_HOME": {"source": "/cluster/home/jdoe/.cache/huggingface"},
    "TORCH_HOME": {"source": "/cluster/home/jdoe/.cache/torch"}
  }
}
JSON

# Sync and parse the exported paths as JSON
euler-files sync --json

# Install packages from a JSON request
cat <<'JSON' | euler-files venv install --input-json - --json
{
  "env_name": "train-cu121",
  "packages": [
    "torch==2.4.0+cu121",
    "transformers",
    "datasets"
  ],
  "python": "3.11",
  "uv_cache_dir": "/cluster/home/jdoe/.cache/uv"
}
JSON

# Bulk-migrate VENV_DIR + UV_CACHE_DIR
cat <<'JSON' | euler-files venv migrate-store --input-json - --json
{
  "old_venv_base": "/cluster/home/jdoe/venvs",
  "new_venv_base": "/cluster/project/ml/venvs",
  "old_uv_cache_dir": "/cluster/home/jdoe/.cache/uv",
  "new_uv_cache_dir": "/cluster/project/ml/.cache/uv",
  "verify": true,
  "update_config": true
}
JSON
```

## Apptainer support

euler-files can also build and manage Apptainer (Singularity) container images
from Python virtual environments. This is useful for packaging ML environments
into portable `.sif` files.

### `euler-files apptainer init`

Interactive wizard that configures:

- **venv_base**: directory containing your venvs (supports `$ENV_VAR` syntax)
- **sif_store**: persistent storage for built `.sif` files
- **scratch_sif_dir**: scratch location for synced `.sif` files
- **base_image**: Docker image template (default: `python:{version}-slim`)
- **container_venv_path**: path inside the container (default: `/opt/venv`)
- **build_args**: extra flags for `apptainer build` (default: `["--fakeroot"]`)

### `euler-files apptainer build [VENV_NAME] [OPTIONS]`

Builds a `.sif` image from a Python venv.

| Option      | Description                              |
|-------------|------------------------------------------|
| `VENV_NAME` | Venv to build; interactive picker if omitted |
| `--force`   | Rebuild even if `.sif` already exists    |
| `--dry-run` | Show definition file and commands without building |

**Build pipeline:**

1. **Tar** — Pre-packs the venv into a tarball. This is a critical optimization
   for shared HPC filesystems (GPFS/Lustre): Apptainer's `%files` directive
   does a per-file `stat`+`open`+`read` for every file in the venv (often tens
   of thousands), while tar reads sequentially in a single stream. The
   difference can be orders of magnitude.
2. **Generate** — Creates an Apptainer definition file that extracts the
   tarball, fixes shebangs, rewires `pyvenv.cfg`, and sets up the environment.
3. **Build** — Runs `apptainer build` to produce the `.sif` file.
4. **Cleanup** — Removes the tarball (in a `finally` block, so it's cleaned up
   even on failure).

### `euler-files apptainer sync [OPTIONS]`

Syncs `.sif` images from persistent storage to scratch.

| Option         | Description                                |
|----------------|---------------------------------------------|
| `--dry-run`    | Show what would be synced                   |
| `--force`      | Ignore freshness checks                     |
| `--image NAME` | Sync only specific image(s); repeatable     |

Uses single-file rsync (no trailing-slash semantics) optimized for large files
with resumable partial transfers.

### `euler-files apptainer prune [IMAGE_NAME] [OPTIONS]`

Removes venvs, `.sif` images, or both.

| Option      | Description                                         |
|-------------|-----------------------------------------------------|
| `IMAGE_NAME`| Image to prune; interactive picker if omitted       |
| `--mode`    | `both` (default), `venv`, or `sif`                  |
| `--dry-run` | Show what would be deleted                          |
| `--yes`     | Skip confirmation prompt                            |

### `euler-files apptainer fixup [VENV_NAME] [OPTIONS]`

Fixes venv internal paths after manually moving the venv base directory.
Rewrites `bin/activate` VIRTUAL_ENV and all shebangs in `bin/*` scripts.
Fixes a single venv if `VENV_NAME` is given, otherwise fixes all venvs.

| Option      | Description                                |
|-------------|--------------------------------------------|
| `VENV_NAME` | Single venv to fix; all if omitted         |
| `--dry-run` | Show what would be fixed                   |

## Configuration

Config lives at `~/.euler-files.json`. It is created by `euler-files init` and
updated by other commands. You can also edit it by hand.

### Fully annotated example

```jsonc
{
  // Config format version. Must be 1. euler-files will refuse to load
  // configs with a different version number.
  "version": 1,

  // Root of the fast scratch filesystem. Supports $ENV_VAR syntax —
  // expanded at runtime, not stored literally. This is typically "$SCRATCH"
  // on ETH Euler or similar HPC systems.
  "scratch_base": "$SCRATCH",

  // Subdirectory under scratch_base where euler-files stores its synced
  // caches, marker files, and lock files. Each managed variable gets its
  // own subdirectory here (e.g. $SCRATCH/.cache/euler-files/HF_HOME/).
  "cache_root": ".cache/euler-files",

  // Shared uv cache directory. Used by `euler-files venv install`,
  // `euler-files venv migrate`, and `euler-files venv migrate-store`
  // when no explicit --uv-cache-dir is provided.
  "uv_cache_dir": "/cluster/home/jdoe/.cache/uv",

  // ── Managed environment variables ──────────────────────────────────
  // Each key is an env var name. When synced, euler-files will:
  //   1. rsync "source" -> $SCRATCH/.cache/euler-files/<VAR_NAME>/
  //   2. Print: export <VAR_NAME>=$SCRATCH/.cache/euler-files/<VAR_NAME>
  "vars": {
    "HF_HOME": {
      // Absolute path to the persistent source directory. This is where
      // HuggingFace stores models, datasets, and tokenizers by default.
      "source": "/cluster/home/jdoe/.cache/huggingface",

      // Set to false to temporarily skip this variable during sync
      // without removing it from the config.
      "enabled": true
    },
    "TORCH_HOME": {
      "source": "/cluster/home/jdoe/.cache/torch",
      "enabled": true
    },
    "PIP_CACHE_DIR": {
      "source": "/cluster/home/jdoe/.cache/pip",

      // Disabled: won't be synced unless you flip this to true or
      // explicitly pass --var PIP_CACHE_DIR.
      "enabled": false
    }
  },

  // ── rsync options ──────────────────────────────────────────────────
  // Extra arguments appended to every rsync invocation. Useful for
  // SSH tunneling, bandwidth limits, or exclude patterns.
  // Example: ["--bwlimit=50000", "--exclude", "*.tmp"]
  "rsync_extra_args": [],

  // ── Concurrency ────────────────────────────────────────────────────
  // Maximum number of variables synced in parallel. Each variable gets
  // its own thread + rsync process. Set to 1 for serial execution.
  "parallel_jobs": 4,

  // ── Locking ────────────────────────────────────────────────────────
  // Maximum time (seconds) to wait for a per-variable flock before
  // giving up. Prevents deadlocks when multiple jobs sync simultaneously.
  // The lock uses polling with exponential backoff (not signals), so it
  // works safely inside the thread pool.
  "lock_timeout_seconds": 300,

  // ── Smart-skip ─────────────────────────────────────────────────────
  // If a sync marker is younger than this many seconds AND the source
  // directory's top-level mtime hasn't changed, rsync is skipped
  // entirely. Set to 0 to disable smart-skip (always run rsync).
  //
  // Note: only top-level changes (directory mtime + immediate children
  // mtime) are detected. Deep changes inside subdirectories won't
  // invalidate the marker — but rsync itself handles those efficiently
  // with its delta-transfer algorithm.
  "skip_if_fresh_seconds": 3600,

  // ── Apptainer container management (optional) ──────────────────────
  // This entire section is optional. Omit it if you don't use Apptainer.
  // Run 'euler-files apptainer init' to set it up interactively.
  "apptainer": {
    // Directory containing your Python virtual environments.
    // Supports $ENV_VAR syntax (expanded at runtime).
    "venv_base": "/cluster/home/jdoe/venvs",

    // Persistent storage for built .sif files. These survive job
    // termination and scratch cleanup.
    "sif_store": "/cluster/home/jdoe/.apptainer/sif",

    // Scratch location where .sif files are synced for fast access
    // during jobs. Same idea as the cache sync above.
    "scratch_sif_dir": "$SCRATCH/.cache/euler-files/sif",

    // Docker base image template for Apptainer builds. The placeholder
    // {version} is replaced with the Python major.minor from the venv
    // (e.g. "3.11"). You can use any Docker image here.
    "base_image": "python:{version}-slim",

    // Canonical path where the venv is mounted inside the container.
    // The definition file extracts the tarball here and fixes all paths
    // to match.
    "container_venv_path": "/opt/venv",

    // Extra arguments for 'apptainer build'. Common values:
    //   --fakeroot  — build without root (default)
    //   --nv        — pass through NVIDIA GPU drivers
    //   --force     — overwrite existing .sif
    "build_args": ["--fakeroot"],

    // ── Built images ─────────────────────────────────────────────────
    // Populated automatically by 'euler-files apptainer build'.
    // You generally don't edit this by hand.
    "images": {
      "my-ml-env": {
        // Name of the source venv directory under venv_base.
        "venv_name": "my-ml-env",

        // Python version detected from pyvenv.cfg at build time.
        "python_version": "3.11.5",

        // Filename of the .sif in sif_store.
        "sif_filename": "my-ml-env.sif",

        // Unix timestamp of when this image was last built.
        "built_at": 1700000000.0,

        // Set to false to skip this image during 'apptainer sync'.
        "enabled": true
      }
    }
  },

  // ── Migration history (optional) ───────────────────────────────────
  // Recorded automatically by 'euler-files migrate'. Each entry tracks
  // a directory move. Useful for auditing and debugging.
  "migrations": [
    {
      "old_path": "/cluster/home/jdoe/.cache/huggingface",
      "new_path": "/cluster/project/ml-data/huggingface",
      "migrated_at": 1700000000.0,

      // Which config field was updated: "source" for env var migrations,
      // "venv_base" or "sif_store" for apptainer field migrations.
      "field_name": "source",

      // For env var migrations: the variable name. Empty string for
      // apptainer field migrations.
      "var_name": "HF_HOME"
    }
  ]
}
```

> **Note:** The config file is plain JSON (not JSONC). The comments above are
> for documentation only. Do not copy them into your actual config file.

### Config path resolution

- Config is always at `~/.euler-files.json`
- `scratch_base`, `uv_cache_dir`, and `apptainer.venv_base` support `$ENV_VAR` syntax and `~`
  expansion (resolved at runtime)
- All other paths are stored as absolute literals

## Congruency checks

On every `sync`, `push`, `status`, and `build`, euler-files checks that your
current environment variables match the config. If `$HF_HOME` is already set to
a different path than what the config has as its source, you'll see a warning
like:

```
[WARN] HF_HOME is set to /some/other/path but config source is
       /cluster/home/jdoe/.cache/huggingface
```

This catches stale `.bashrc` exports that conflict with euler-files management.

## Exit codes

| Code | Meaning                                              |
|------|------------------------------------------------------|
| 0    | Success                                              |
| 1    | One or more syncs/pushes failed (partial failure)    |
| 2    | Configuration error (file not found, version mismatch, etc.) |

## Quirks and things to know

- **stdout vs stderr**: `euler-files sync` prints *only* `export` statements to
  stdout. Everything else (progress, warnings, errors) goes to stderr. This is
  by design — `eval "$(euler-files sync)"` must not accidentally eval status
  messages.

- **rsync trailing slash**: Internally, source paths always get a trailing `/`
  appended so rsync copies *contents* into the target, not the source directory
  itself as a subdirectory.

- **Marker mtime depth**: Smart-skip only checks mtime of the directory itself
  and its immediate children (depth 0 + 1). If you add a file three levels
  deep, the marker won't notice — but rsync will still transfer it correctly
  when the marker eventually expires or you use `--force`.

- **Lock polling**: Locking uses a polling loop with exponential backoff rather
  than `signal.alarm`, because the sync runs inside a `ThreadPoolExecutor` where
  signal-based timeouts don't work.

- **rsync exit codes 23 and 24** (partial transfer / vanished files) are treated
  as warnings, not errors. This is common on shared filesystems where files may
  appear or disappear during a sync.

- **Tarball cleanup**: When building Apptainer images, the intermediate tarball
  (which can be multi-GB) is always cleaned up in a `finally` block, even if the
  build fails.

- **Disabled variables** are kept in config but skipped during sync. They are
  *not* exported. Use `--var VAR_NAME` to force-sync a disabled variable.

- **Config version**: The config has a `version` field (currently `1`).
  euler-files refuses to load configs with a mismatched version — you'll need to
  re-run `euler-files init`.

- **Fish shell**: The generated `ef` function has slightly different syntax for
  Fish. Use `euler-files shell-init --shell fish`.

## Typical HPC workflow

### Cache syncing

```bash
# One-time setup (interactive)
euler-files init

# Optional: set up shell integration
echo 'eval "$(euler-files shell-init)"' >> ~/.bashrc

# In your SLURM job script:
#!/bin/bash
#SBATCH --job-name=train
#SBATCH --gpus=1
#SBATCH --mem=64G
#SBATCH --time=24:00:00

# Sync caches to scratch (fast local disk)
eval "$(euler-files sync)"

# Now HF_HOME, TORCH_HOME, etc. point to scratch
python train.py --model bert-large ...

# After training, push any newly downloaded artifacts back
euler-files push
```

### Apptainer images

Apptainer (formerly Singularity) images let you freeze a Python environment
into a single portable `.sif` file. This is useful when you want reproducible
runs without re-syncing thousands of venv files, or when you need to run on
nodes where installing packages is impractical.

```bash
# ── One-time setup ────────────────────────────────────────────────────

# 1. Create a venv with uv (or python -m venv / virtualenv)
euler-files venv install my-ml-env torch transformers datasets

# 2. Configure apptainer support (interactive wizard)
euler-files apptainer init
# → asks for venv_base (e.g. ~/venvs), sif_store, scratch_sif_dir, etc.

# 3. Build a .sif image from the venv
#    This tars the venv first (fast sequential I/O), then runs apptainer build.
euler-files apptainer build my-ml-env

# 4. Sync the .sif to scratch for fast job-time access
euler-files apptainer sync


# ── In your SLURM job script ─────────────────────────────────────────

#!/bin/bash
#SBATCH --job-name=train
#SBATCH --gpus=1
#SBATCH --mem=64G
#SBATCH --time=24:00:00

# Load apptainer (cluster-specific)
module load apptainer

# Sync caches AND .sif images to scratch
eval "$(euler-files sync)"
euler-files apptainer sync

# Run your script inside the container
# The .sif's runscript does: exec python "$@"
SIF="$SCRATCH/.cache/euler-files/sif/my-ml-env.sif"
apptainer run --nv "$SIF" train.py --model bert-large ...

# Or get an interactive shell inside the container
apptainer shell --nv "$SIF"


# ── Updating the environment ─────────────────────────────────────────

# Install new packages into the venv
euler-files venv install my-ml-env accelerate

# Rebuild one venv into a new location while preserving installed packages
euler-files venv migrate ~/venvs/my-ml-env /project/ml/venvs/my-ml-env

# Move the whole uv store: cache + all venvs
euler-files venv migrate-store \
  --old-venv-base ~/venvs \
  --new-venv-base /project/ml/venvs \
  --old-uv-cache-dir ~/.cache/uv \
  --new-uv-cache-dir /project/ml/.cache/uv

# Rebuild the image (--force overwrites the existing .sif)
euler-files apptainer build my-ml-env --force

# Re-sync to scratch
euler-files apptainer sync --force


# ── Housekeeping ──────────────────────────────────────────────────────

# Remove an old venv + its .sif
euler-files apptainer prune my-old-env

# Remove only the .sif (keep the venv)
euler-files apptainer prune my-old-env --mode sif

# After moving your venv directory, fix internal paths
euler-files migrate venv_base --to /new/path/to/venvs
# migrate handles rsync + shebang/activate fixup automatically

# Or fix paths manually without migrating
euler-files apptainer fixup
```

**Why the tarball step matters:** On shared HPC filesystems (GPFS, Lustre),
metadata operations are expensive. A typical ML venv contains 30,000+ files.
Apptainer's `%files` directive copies each file individually — that's 30,000
`stat` + `open` + `read` calls on the shared FS. Pre-packing into a tarball
turns this into a single sequential read, which can be 10-100x faster.

## License

MIT
