Metadata-Version: 2.4
Name: vcfcache
Version: 0.4.1
Summary: Cache-based VCF annotation accelerator
Project-URL: Homepage, https://github.com/julius-muller/vcfcache
Project-URL: Source, https://github.com/julius-muller/vcfcache
Project-URL: Issues, https://github.com/julius-muller/vcfcache/issues
Author-email: Julius Müller <julius.mueller@dkfz-heidelberg.de>
License: GPL-3.0-only
License-File: LICENSE
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.11
Requires-Dist: click>=8.0.0
Requires-Dist: pysam>=0.22.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: requests>=2.20.0
Provides-Extra: dev
Requires-Dist: black>=25.12.0; extra == 'dev'
Requires-Dist: mypy>=1.19.0; extra == 'dev'
Requires-Dist: pandas>=2.2.0; extra == 'dev'
Requires-Dist: pyarrow>=20.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=6.0.0; extra == 'dev'
Requires-Dist: pytest>=9.0.2; extra == 'dev'
Requires-Dist: ruff>=0.14.8; extra == 'dev'
Requires-Dist: twine>=4.0.0; extra == 'dev'
Provides-Extra: gnomad
Requires-Dist: hail>=0.2.0; extra == 'gnomad'
Provides-Extra: parquet
Requires-Dist: pandas>=2.2.0; extra == 'parquet'
Requires-Dist: pyarrow>=20.0.0; extra == 'parquet'
Description-Content-Type: text/markdown

[![DOI](https://zenodo.org/badge/947952659.svg)](https://zenodo.org/badge/latestdoi/947952659)
[![CI](https://github.com/julius-muller/vcfcache/actions/workflows/ci.yml/badge.svg)](https://github.com/julius-muller/vcfcache/actions/workflows/ci.yml)
[![License](https://img.shields.io/github/license/julius-muller/vcfcache)](LICENSE)
[![PyPI](https://img.shields.io/pypi/v/vcfcache)](https://pypi.org/project/vcfcache/)
[![Cite](https://img.shields.io/badge/Cite-CITATION.cff-blue)](CITATION.cff)
[![codecov](https://codecov.io/github/julius-muller/vcfcache/graph/badge.svg?token=ELV3PZ6PNL)](https://codecov.io/github/julius-muller/vcfcache)


# VCFcache – Cache once, annotate fast

Cache common variants once, reuse them for every sample. VCFcache builds a normalized blueprint, annotates it once, and reuses those results so only rare/novel variants are annotated at runtime.

**Performance**: With 60-90% cache hit rates on typical samples, VCFcache achieves 2-10× speed-ups compared to standard annotation pipelines. Cache lookups are constant-time operations regardless of cache size, making the tool highly scalable. See [WIKI.md](WIKI.md#performance-model) for the detailed runtime efficiency model.

Works with any genome/build (human, mouse, plants, model organisms) as long as your inputs and annotation pipeline use the same reference/contig naming.

---

## Quick Start - pip install

Requires: Python >= 3.11 (earlier versions untested), bcftools >= 1.20

```bash
pip install vcfcache
vcfcache demo --smoke-test  # Run comprehensive demo
vcfcache --help
```

Install bcftools separately:
- Ubuntu/Debian: `sudo apt-get install bcftools`
- macOS: `brew install bcftools`
- Conda: `conda install -c bioconda bcftools`

---

## Quick Start - Docker

**Docker includes bcftools** - no separate installation needed.

```bash
docker pull ghcr.io/julius-muller/vcfcache:latest

# List available public caches
docker run --rm ghcr.io/julius-muller/vcfcache:latest list caches

# Use a public cache from Zenodo
docker run --rm -v $(pwd):/work ghcr.io/julius-muller/vcfcache:latest \
  annotate \
    -a cache-hg38-gnomad-4.1joint-AF0100-vep-115.2-basic \
    --vcf /work/sample.vcf.gz \
    --output /work/out 

```

---

## Quick Start - from source

```bash
git clone https://github.com/julius-muller/vcfcache.git
cd vcfcache
uv venv .venv && source .venv/bin/activate
uv pip install -e ".[dev]"
vcfcache --help
```

---

## Build Your Own Cache

1. **Create blueprint** (normalize/deduplicate variants):
```bash
vcfcache blueprint-init --vcf gnomad.bcf --output ./cache -y params.yaml
```

2. **Annotate blueprint** (create cache):
```bash
vcfcache cache-build --name vep_cache --db ./cache -a annotation.yaml -y params.yaml
```

3. **Use cache** on samples:
```bash
vcfcache annotate -a ./cache/cache/vep_cache --vcf sample.vcf.gz --output ./results
```

---

## Configuration

Override system bcftools (if needed):
```bash
export VCFCACHE_BCFTOOLS=/path/to/bcftools-1.22
```

Change where downloaded caches/blueprints are stored (default: `~/.cache/vcfcache`):
```bash
export VCFCACHE_DIR=/path/to/vcfcache_cache_dir
```

Or in `params.yaml`:
```yaml
bcftools_cmd: "/path/to/bcftools"
```

See [WIKI.md](WIKI.md) for detailed configuration, cache distribution via Zenodo, and troubleshooting.

---

## Links

- **Documentation**: [WIKI.md](WIKI.md)
- **Source**: https://github.com/julius-muller/vcfcache
- **Issues**: https://github.com/julius-muller/vcfcache/issues
- **Docker**: ghcr.io/julius-muller/vcfcache
