Metadata-Version: 2.4
Name: svphaser
Version: 2.1.6
Summary: Structural-variant phasing from HP-tagged long-read BAMs
Project-URL: Homepage, https://github.com/SFGLab/SvPhaser
Project-URL: Issues, https://github.com/SFGLab/SvPhaser/issues
Project-URL: Source, https://github.com/SFGLab/SvPhaser
Author-email: SvPhaser Team <you@lab.org>
License: MIT
License-File: LICENSE
Keywords: BAM,ONT,VCF,genomics,long-reads,phasing,structural-variants
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.9
Requires-Dist: cyvcf2>=0.30
Requires-Dist: pandas>=2.1
Requires-Dist: pysam>=0.23
Requires-Dist: typer>=0.14
Provides-Extra: bench
Requires-Dist: py-spy>=0.3; extra == 'bench'
Requires-Dist: pytest-benchmark>=4.0; extra == 'bench'
Provides-Extra: dev
Requires-Dist: black>=24.3; extra == 'dev'
Requires-Dist: build>=1.2; extra == 'dev'
Requires-Dist: hypothesis>=6.90; extra == 'dev'
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: pandas-stubs>=2.0; extra == 'dev'
Requires-Dist: pre-commit>=3.6; extra == 'dev'
Requires-Dist: pytest-cov>=5; extra == 'dev'
Requires-Dist: pytest-xdist>=3.5; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Requires-Dist: tox>=4.10; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Provides-Extra: plots
Requires-Dist: matplotlib>=3.7; extra == 'plots'
Description-Content-Type: text/markdown

# SvPhaser

> **Haplotype-aware structural-variant (SV) phasing and genotyping from long-read data**

[![PyPI version](https://img.shields.io/pypi/v/svphaser.svg?logo=pypi)](https://pypi.org/project/svphaser/)
[![Python](https://img.shields.io/pypi/pyversions/svphaser.svg)](https://pypi.org/project/svphaser/)
[![License](https://img.shields.io/github/license/SFGLab/SvPhaser.svg)](LICENSE)

---

**SvPhaser** assigns **haplotype-aware genotypes** to **pre-called structural variants (SVs)** using **HP-tagged long-read alignments** (PacBio HiFi, ONT Q20+, etc.).

It fills a critical gap in long-read SV analysis:

* SV callers (e.g. Sniffles2) **discover variants**
* SvPhaser **phases and genotypes them** (`1|0`, `0|1`, `1|1`, or `./.`)
* with explicit **read-level evidence** and a quantitative **genotype quality (GQ)**

SvPhaser is **caller-agnostic**, **deterministic**, and designed for **large-scale benchmarking and biological interpretation**.

---

## Key features

* **Post-hoc SV phasing** from HP-tagged BAM/CRAM (no re-calling required)
* **Per-chromosome parallelization** (efficient on HPC and multi-core systems)
* **SV-type-aware evidence detection** (DEL / INS / INV / BND / DUP)
* **Deterministic Δ-based decision logic** (no HMMs, no sampling)
* **Explicit confidence modeling** via GQ and reason codes
* **CSV-first design** for transparent benchmarking and debugging
* **VCF-compliant output** with rich `SVP_*` INFO annotations

---

## Installation

### From PyPI (recommended)

```bash
# Requires Python >= 3.9
pip install svphaser
```

Optional extras:

```bash
pip install "svphaser[plots]"   # plotting utilities
pip install "svphaser[bench]"   # benchmarking helpers
pip install "svphaser[dev]"     # development + linting
```

### From source

```bash
git clone https://github.com/SFGLab/SvPhaser.git
cd SvPhaser
pip install -e .
```

---

## Inputs & requirements

SvPhaser requires **two inputs only**:

1. **Unphased SV VCF** (`.vcf` / `.vcf.gz`)

   * Produced by an SV caller (e.g. Sniffles2)
   * May optionally contain `RNAMES` INFO for precise read support

2. **HP-tagged BAM/CRAM**

   * Long-read alignments with haplotype tags (`HP=1/2`)
   * Generated by an upstream phasing pipeline (e.g. WhatsHap)

> ⚠️ If the BAM does not contain HP tags, SvPhaser cannot assign haplotypes.

---

## Quick start (CLI)

```bash
svphaser phase \
  sample_unphased.vcf.gz \
  sample.sorted_phased.bam \
  --out-dir results/ \
  --min-support 10 \
  --min-tagged-support 3 \
  --major-delta 0.60 \
  --equal-delta 0.10 \
  --support-mode hybrid \
  --dynamic-window \
  --tie-to-hom-alt \
  --gq-bins "30:High,10:Moderate" \
  --threads 32
```

---

## Outputs

For an input `sample.vcf.gz`, SvPhaser produces:

* **`sample_phased.csv`** — *primary analysis artifact*

  * Per-SV read support (`hp1`, `hp2`, `nohp`)
  * Derived metrics (`tagged_total`, `support_total`, Δ)
  * Final decisions (`gt`, `gq`, `reason`)

* **`sample_phased.vcf(.gz)`** — interoperability output

  * `FORMAT/GT`, `FORMAT/GQ`
  * Optional `SVP_*` INFO annotations when `--svp-info` is enabled

The CSV is intended for **benchmarking, visualization, and interpretation**;
the VCF is a downstream-consumable representation.

---

## Algorithm & methodology

A full, implementation-faithful description of the algorithm—including:

* evidence collection
* haplotype decision logic
* pseudoalgorithm
* workflow diagram

is provided in:

➡️ **`docs/Methodology.md`**

This document is the authoritative reference for reviewers and users seeking algorithmic clarity.

---

## Python API

```python
from pathlib import Path
from svphaser.phasing.io import phase_vcf

phase_vcf(
    Path("sample.vcf.gz"),
    Path("sample.sorted_phased.bam"),
    out_dir=Path("results"),
    min_support=10,
    min_tagged_support=3,
    major_delta=0.60,
    equal_delta=0.10,
    support_mode="hybrid",
    dynamic_window=True,
    tie_to_hom_alt=True,
    gq_bins="30:High,10:Moderate",
    threads=8,
)
```

---

## Repository structure

```
SvPhaser/
├─ src/svphaser/        # core package
├─ docs/                # methodology & design notes
├─ tests/               # unit + regression tests
├─ notebooks/           # benchmarking & analysis
├─ pyproject.toml
├─ README.md
└─ CHANGELOG.md
```

---

## Citing SvPhaser

If SvPhaser contributes to your research, please cite:

```bibtex
@software{svphaser2026,
  author  = {Pranjul Mishra and Sachin Gadakh},
  title   = {SvPhaser: Haplotype-aware phasing of structural variants from long-read data},
  version = {2.1.x},
  year    = {2026},
  url     = {https://github.com/SFGLab/SvPhaser},
  note    = {PyPI: https://pypi.org/project/svphaser/}
}
```

For maximum reproducibility, include the exact git commit hash used.

---

## License

SvPhaser is released under the **MIT License** — see [LICENSE](LICENSE).

---

## Contact

Developed at **SFG Lab (BioAI)**.

* **Pranjul Mishra** — [pranjul.mishra@proton.me](mailto:pranjul.mishra@proton.me)
* **Sachin Gadakh** — [s.gadakh@cent.uw.edu.pl](mailto:s.gadakh@cent.uw.edu.pl)

Bug reports and feature requests: please open a GitHub issue.
