Metadata-Version: 2.4
Name: slide2vec
Version: 4.4.0
Summary: Embedding of whole slide images with Foundation Models
Author-email: Clément Grisi <clement.grisi@radboudumc.nl>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/clemsgrs/slide2vec
Project-URL: Bug Tracker, https://github.com/clemsgrs/slide2vec/issues
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: hs2p[asap,cucim,openslide,sam2,vips]>=4.0.1
Requires-Dist: omegaconf
Requires-Dist: matplotlib
Requires-Dist: numpy<2
Requires-Dist: pandas
Requires-Dist: pillow
Requires-Dist: rich
Requires-Dist: torch
Requires-Dist: torchvision
Requires-Dist: transformers
Requires-Dist: wandb
Requires-Dist: einops
Requires-Dist: timm
Requires-Dist: huggingface_hub
Provides-Extra: hoptimus
Requires-Dist: torch>=2.0; extra == "hoptimus"
Requires-Dist: torchvision>=0.15.0; extra == "hoptimus"
Requires-Dist: xformers>=0.0.18; extra == "hoptimus"
Provides-Extra: virchow
Requires-Dist: timm>=0.9.11; extra == "virchow"
Requires-Dist: torch>=2.0; extra == "virchow"
Provides-Extra: uni
Requires-Dist: torch>=2.0; extra == "uni"
Requires-Dist: timm>=0.9.8; extra == "uni"
Requires-Dist: xformers>=0.0.18; extra == "uni"
Provides-Extra: prism
Requires-Dist: transformers~=4.53.0; extra == "prism"
Requires-Dist: torch<2.8,>=2.3; extra == "prism"
Requires-Dist: einops==0.8.0; extra == "prism"
Requires-Dist: environs==11.0.0; extra == "prism"
Requires-Dist: sacremoses==0.1.1; extra == "prism"
Requires-Dist: xformers==0.0.31; extra == "prism"
Provides-Extra: hibou
Requires-Dist: scipy~=1.8.1; extra == "hibou"
Requires-Dist: scikit-image~=0.19.3; extra == "hibou"
Provides-Extra: moozy
Requires-Dist: huggingface_hub<1.0,>=0.30.0; extra == "moozy"
Provides-Extra: titan
Requires-Dist: torch==2.0.1; extra == "titan"
Requires-Dist: timm==1.0.3; extra == "titan"
Requires-Dist: einops==0.6.1; extra == "titan"
Requires-Dist: einops-exts==0.0.4; extra == "titan"
Requires-Dist: transformers==4.46.0; extra == "titan"
Provides-Extra: fm
Requires-Dist: omegaconf>=2.3.0; extra == "fm"
Requires-Dist: matplotlib; extra == "fm"
Requires-Dist: numpy<2; extra == "fm"
Requires-Dist: pandas; extra == "fm"
Requires-Dist: pillow; extra == "fm"
Requires-Dist: rich; extra == "fm"
Requires-Dist: hs2p[asap,cucim,openslide,sam2,vips]>=4.0.1; extra == "fm"
Requires-Dist: wandb; extra == "fm"
Requires-Dist: torch<2.8,>=2.3; extra == "fm"
Requires-Dist: torchvision>=0.18.0; extra == "fm"
Requires-Dist: einops>=0.8.0; extra == "fm"
Requires-Dist: timm>=1.0.3; extra == "fm"
Requires-Dist: huggingface_hub<1.0,>=0.30.0; extra == "fm"
Requires-Dist: environs; extra == "fm"
Requires-Dist: einops-exts>=0.0.4; extra == "fm"
Requires-Dist: transformers>=4.53; extra == "fm"
Requires-Dist: sacremoses; extra == "fm"
Requires-Dist: xformers>=0.0.31; extra == "fm"
Requires-Dist: scipy>=1.8.1; extra == "fm"
Requires-Dist: scikit-image>=0.19.3; extra == "fm"
Requires-Dist: torchmetrics>=0.10.3; extra == "fm"
Requires-Dist: fvcore; extra == "fm"
Requires-Dist: iopath; extra == "fm"
Requires-Dist: webdataset; extra == "fm"
Requires-Dist: scikit-survival; extra == "fm"
Requires-Dist: scikit-learn; extra == "fm"
Requires-Dist: fairscale; extra == "fm"
Requires-Dist: packaging==23.2; extra == "fm"
Requires-Dist: ninja==1.11.1.1; extra == "fm"
Requires-Dist: psutil<6; extra == "fm"
Provides-Extra: docs
Requires-Dist: sphinx>=8.1; extra == "docs"
Requires-Dist: furo; extra == "docs"
Requires-Dist: myst-parser; extra == "docs"
Requires-Dist: sphinx-copybutton; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints; extra == "docs"
Provides-Extra: testing
Requires-Dist: pytest>=6.0; extra == "testing"
Requires-Dist: pytest-cov>=2.0; extra == "testing"
Requires-Dist: mypy>=0.910; extra == "testing"
Requires-Dist: flake8>=3.9; extra == "testing"
Requires-Dist: flake8-pyproject>=1.2.3; extra == "testing"
Requires-Dist: tox>=3.24; extra == "testing"
Dynamic: license-file

# slide2vec

[![PyPI version](https://img.shields.io/pypi/v/slide2vec?label=pypi&logo=pypi&color=3776AB)](https://pypi.org/project/slide2vec/)
[![Docs](https://img.shields.io/badge/docs-website-blue)](https://clemsgrs.github.io/slide2vec/)

`slide2vec` is a Python package for efficient encoding of whole-slide images using publicly available foundation models. It builds on [`hs2p`](https://pypi.org/project/hs2p/) for fast preprocessing and exposes a focused surface around `Model`, `Pipeline`, and `ExecutionOptions`.

Documentation site: [https://clemsgrs.github.io/slide2vec/](https://clemsgrs.github.io/slide2vec/)

## Installation

```shell
pip install slide2vec
pip install "slide2vec[fm]"
```

`slide2vec` keeps the base install focused on the core package surface. Use `slide2vec[fm]` when you want the PyPI-hosted FM dependencies.

Some model backends still rely on upstream Git repositories that PyPI will not accept as package metadata. Install those separately when needed:

```shell
pip install git+https://github.com/lilab-stanford/MUSK.git
pip install git+https://github.com/Mahmoodlab/CONCH.git
pip install git+https://github.com/prov-gigapath/prov-gigapath.git
```

AtlasPatch-backed tissue segmentation is available through hs2p's `sam2` path in the bundled install.

## Python API

```python
from slide2vec import Model
from slide2vec.utils.config import hf_login

hf_login()

model = Model.from_preset("virchow2")
embedded = model.embed_slide("/path/to/slide.svs")

tile_embeddings = embedded.tile_embeddings
x = embedded.x
y = embedded.y
```

Use `list_models()` when you want to inspect the shipped presets programmatically:

```python
from slide2vec import list_models

all_models = list_models()
tile_models = list_models("tile")
slide_models = list_models("slide")
patient_models = list_models("patient")
```

Use `Pipeline(...)` for manifest-driven batch processing when you want artifacts written to disk instead of only in-memory outputs:

```python
from slide2vec import ExecutionOptions, Pipeline, PreprocessingConfig

pipeline = Pipeline(
    model=model,
    preprocessing=PreprocessingConfig(
        requested_spacing_um=0.5,
        requested_tile_size_px=224,
        tissue_threshold=0.1,
    ),
    execution=ExecutionOptions(output_dir="outputs/demo"),
)
result = pipeline.run(manifest_path="/path/to/slides.csv")
```

By default, `ExecutionOptions()` uses all available GPUs. Set `ExecutionOptions(num_gpus=4)` when you want to cap the sharding explicitly.

### Hierarchical Feature Extraction

Tile embeddings can be spatially grouped into regions for downstream models that consume region-level structure. Enable it by setting `region_tile_multiple` on `PreprocessingConfig`:

```python
preprocessing = PreprocessingConfig(
    requested_spacing_um=0.5,
    requested_tile_size_px=224,
    region_tile_multiple=6,  # 6x6 tiles per region
)
embedded = model.embed_slide("/path/to/slide.svs", preprocessing=preprocessing)
```

Hierarchical outputs have shape `(num_regions, tiles_per_region, feature_dim)` and are written to `hierarchical_embeddings/` when persisted.

See [`docs/python-api.md`](docs/python-api.md) for details.

### Input Manifest

Manifest-driven runs use the schema below. `mask_path` and `spacing_at_level_0` are optional.

```csv
sample_id,image_path,mask_path,spacing_at_level_0
slide-1,/path/to/slide-1.svs,/path/to/mask-1.png,0.25
slide-2,/path/to/slide-2.svs,,
...
```

Use `spacing_at_level_0` when the slide file reports a missing or incorrect level-0 spacing and you want to override it.


### Outputs

The package writes explicit artifact directories:

- `tile_embeddings/<sample_id>.pt` or `.npz`
- `tile_embeddings/<sample_id>.meta.json`
- `hierarchical_embeddings/<sample_id>.pt` or `.npz` (when `region_tile_multiple` is set)
- `hierarchical_embeddings/<sample_id>.meta.json`
- `slide_embeddings/<sample_id>.pt` or `.npz`
- `slide_embeddings/<sample_id>.meta.json`
- optional `slide_latents/<sample_id>.pt` or `.npz`

`.pt` remains the default format. `.npz` is available through `ExecutionOptions(output_format="npz")`.

### Supported Models

`slide2vec` currently ships preset configs for 17 tile-level models and 3 slide-level models.  
For the full catalog and preset names, see [`docs/models.md`](docs/models.md).

## CLI

The CLI is a thin wrapper over the package API.  
Bundled configs live under `slide2vec/configs/preprocessing/` and `slide2vec/configs/models/`.

```shell
slide2vec /path/to/config.yaml
```

By default, manifest-driven CLI runs use all available GPUs. Set `speed.num_gpus=4` when you want to cap the sharding explicitly.

New to the CLI or doing batch runs to disk? Start with [`docs/cli.md`](docs/cli.md) for the config-driven workflow, overrides, and common run patterns.

## Docker

[![Docker Version](https://img.shields.io/docker/v/waticlems/slide2vec?sort=semver&label=docker&logo=docker&color=2496ED)](https://hub.docker.com/r/waticlems/slide2vec)

Docker remains available when you prefer a containerized runtime:

```shell
docker pull waticlems/slide2vec:latest
docker run --rm -it \
    -v /path/to/your/data:/data \
    -e HF_TOKEN=<your-huggingface-api-token> \
    waticlems/slide2vec:latest
```

## Documentation

- [Documentation website](https://clemsgrs.github.io/slide2vec/) for the polished docs site
- [`docs/python-api.md`](docs/python-api.md) for the detailed API reference
- [`docs/cli.md`](docs/cli.md) for the config-driven CLI guide
- [`docs/models.md`](docs/models.md) for the full supported-model catalog
- [`tutorials/api_walkthrough.ipynb`](tutorials/api_walkthrough.ipynb) for a notebook walkthrough of the API
