Metadata-Version: 2.4
Name: snpio
Version: 1.5.0
Summary: SNPio is a Python API for population genetic file processing, filtering, and analysis. It is designed to be a user-friendly tool for the manipulation of population genetic data in a variety of formats. SNPio can be used to filter data based on missingness, MAF and MAC, singletons, biallelic, and monomorphic sites. It can also generate summary statistics for population genetic analyses.
Author-email: "Drs. Bradley T. Martin and Tyler K. Chafin" <evobio721@gmail.com>
License: GPL-3.0-or-later
Project-URL: Source Code, https://github.com/btmartin721/SNPio
Project-URL: Bug Tracker, https://github.com/btmartin721/SNPio/issues
Project-URL: Documentation, https://snpio.readthedocs.io/en/latest/
Project-URL: Changelog, https://snpio.readthedocs.io/en/latest/changelog.html
Keywords: genomics,bioinformatics,population genetics,SNP,VCF,PHYLIP,STRUCTURE,missing data,filtering,filter,MAF,minor allele frequency,MAC,minor allele count,biallelic,monomorphic,singleton,population structure,d-statistics,Fst,multiqc,encoding
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Natural Language :: English
Classifier: Operating System :: Unix
Classifier: Operating System :: POSIX
Classifier: Operating System :: MacOS
Requires-Python: <3.13,>=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: bokeh
Requires-Dist: h5py
Requires-Dist: holoviews
Requires-Dist: kaleido
Requires-Dist: kneed
Requires-Dist: matplotlib
Requires-Dist: multiqc>=1.29
Requires-Dist: numba
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: panel
Requires-Dist: plotly
Requires-Dist: pysam
Requires-Dist: requests
Requires-Dist: scikit-learn
Requires-Dist: scipy
Requires-Dist: statsmodels
Requires-Dist: seaborn
Requires-Dist: toytree
Requires-Dist: tqdm
Provides-Extra: docs
Requires-Dist: sphinx; extra == "docs"
Requires-Dist: sphinx-rtd-theme; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints; extra == "docs"
Requires-Dist: sphinxcontrib-bibtex; extra == "docs"
Provides-Extra: dev
Requires-Dist: memory-profiler; extra == "dev"
Requires-Dist: psutil; extra == "dev"
Requires-Dist: sphinx; extra == "dev"
Requires-Dist: sphinx-rtd-theme; extra == "dev"
Requires-Dist: sphinx-autodoc-typehints; extra == "dev"
Requires-Dist: sphinxcontrib-bibtex; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: pyyaml; extra == "dev"
Dynamic: license-file

# SNPio: A Python API for Population Genomic Data I/O, Filtering, and Analysis

![SNPio Logo](snpio/img/snpio_logo.png)

**SNPio** is a Python package designed to streamline the process of reading, filtering, encoding, and analyzing genotype data. It supports VCF, PHYLIP, STRUCTURE, and GENEPOP file formats, and provides high-level tools for visualization, downstream machine learning analysis, and population genetic inference.

---

## 🔧 Installation

You can install SNPio using one of the following methods:

### ✅ Pip Installation (Recommended)

```bash
python3 -m venv snpio-env
source snpio-env/bin/activate
pip install snpio
```

### ✅ Conda Installation

```bash
conda create -n snpio-env python=3.12
conda activate snpio-env
conda install -c btmartin721 snpio
```

### 🐳 Docker

```bash
docker pull btmartin721/snpio:latest
docker run -it btmartin721/snpio:latest
```

> **Note:** SNPio supports Unix-based systems. Windows users should install via WSL.

---

## 🚀 Getting Started

### Import Modules

```python
from snpio import (
    NRemover2, VCFReader, PhylipReader, StructureReader,
    GenePopReader, Plotting, GenotypeEncoder, PopGenStatistics
)
```

### Load Genotype Data (VCF Example)

```python
vcf = "snpio/example_data/vcf_files/phylogen_subset14K_sorted.vcf.gz"
popmap = "snpio/example_data/popmaps/phylogen_nomx.popmap"

gd = VCFReader(
    filename=vcf,
    popmapfile=popmap,
    force_popmap=True,
    verbose=True,
    plot_format="png",
    prefix="snpio_example"
)
```

You can also specify `include_pops` and `exclude_pops` to control population-level filtering.

---

## 📖 Full Documentation

Detailed API usage, tutorials, and examples are available at:

🔗 [https://snpio.readthedocs.io/latest](https://snpio.readthedocs.io/latest)

Includes:

- File readers (VCF, PHYLIP, STRUCTURE, GENEPOP)
- Genotype filtering (NRemover2)
- PCA and missingness plots (Plotting)
- Genotype encoding (GenotypeEncoder)
- Population statistics (PopGenStatistics)
- Experimental: Tree parsing (TreeParser)

---

## 🧪 Development Notes

To run tests:

```bash
pip install snpio[dev]
pytest tests/
```

---

## 🧾 License and Citation

SNPio is licensed under the **GPL-3.0 License**. Please cite any publication(s) when using SNPio in your research.

---

## 🤝 Contributing

We welcome community contributions!

- Report bugs or request features on [GitHub Issues](https://github.com/btmartin721/snpio/issues)
- Submit a pull request
- Visit the [documentation](https://snpio.readthedocs.io/latest) for contributing guidelines

---

## 🙏 Acknowledgments

Thanks for using SNPio. We hope it facilitates your population genomic research. Reach out with questions or feedback via GitHub!
