Metadata-Version: 2.4
Name: svhet
Version: 0.1.0
Summary: An accurate NGS-based structural variant detection tool for human genomes
Home-page: https://github.com/snakesch/svhet
Author: Louis She
Author-email: Louis She <snakesch@connect.hku.hk>
License: MIT License
        
        Copyright (c) 2025 Louis SHE
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Repository, https://github.com/snakesch/SVhet
Project-URL: Issues, https://github.com/snakesch/SVhet/issues
Keywords: genomics,structural variants,variant detection
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cyvcf2>=0.31.0
Requires-Dist: pysam>=0.23.0
Requires-Dist: pybedtools>=0.12.0
Requires-Dist: tqdm>=4.67.1
Requires-Dist: numpy<2,>=1.26.4
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# SVhet

An accurate NGS-based structural variation filtering tool based on heterozygous sites on reads mapped to deleted regions and their flanks. SVhet has been tested on common SV callers including Manta, DELLY and Lumpy (both PE150 and PE250). While SVhet works for all regions in theory, it is only tested on GIAB v0.6 Tier 1 SV regions, with either/both breakpoint(s) located within the regions. To use SVhet with minimal loss in recall and best performance, users are recommended to include only SVs within Tier 1 regions and not specify `--fully-within`.

GIAB v0.6 Tier 1 SV benchmark regions can be downloaded [here](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NIST_SV_v0.6/HG002_SVs_Tier1_v0.6.bed).

# Installation

SVhet can be downloaded and installed using conda or mamba. This will automatically configure the environment and install most dependencies of SVhet.

```bash
git clone git@github.com:snakesch/SVhet.git
cd SVhet
conda env create -f environment.yml
conda activate svhet
```

Next, make sure singularity (v3.0+) is available on the system and pull the DeepVariant image:

```bash
singularity pull docker://google/deepvariant:1.9.0
```

# Running SVhet

SVhet can be run as a single command from CLI and takes as input candidate SVs proposed by state-of-the-art SV callers (both VCF and BAM) and a reference genome. Depending on the specific SV caller used, users may adopt different CIPOS and CIEND tags (e.g. CIPOS95 for Lumpy).

```bash
svhet \
--input VCF \
--bam BAM \
--ref REFERENCE \
--output VCF_OUT \
--image DEEPVARIANT_IMAGE \
--outdir OUTPUT_DIR \
--threads INT \
--cipos-tag CIPOS \
--ciend-tag CIEND
```

A list of all available arguments can be accessed from `svhet --help`.

# Test run

A minimal test set is available from `test/`. To run the test case, download hs37d5 reference and its index file from [GIAB FTP](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/references/GRCh37/)

Test command:
```bash
svhet \
--input test.vcf.gz \
--bam test.bam \
--ref REFERENCE \
--output test.svhet.vcf.gz \
--image DEEPVARIANT_IMAGE \
--outdir test_output/ \
--threads 4 \
--cipos-tag CIPOS \
--ciend-tag CIEND
```

Upon successful execution, output files should be similar to those in `test/test_output/`.

# License

SVhet is available under an [MIT license](LICENSE).

# Issues and correspondence

Issues and correspondence to Louis SHE (snakesch@connect.hku.hk).

# Citation

Preprint pending.
