Metadata-Version: 2.4
Name: seashell-cli
Version: 0.1.10
Summary: Seashell — Genomic data, compressed and queryable
License: Proprietary
Project-URL: Homepage, https://seashell.bio
Project-URL: Documentation, https://seashell.bio/docs
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.28

# Seashell CLI

Command-line tool for querying and managing genomic data on Seashell.

## Install

```bash
pip install seashell-cli
```

## Quick Start

```bash
seashell
```

You'll be prompted for your API key (from your institution admin), username, and password. After login, you're in an interactive shell. Every example below is typed directly at the `seashell>` prompt — copy any line and paste it.

```
LIST PATIENTS
FIND VARIANTS WHERE patient=NA12878 AND gene=BRCA1
EXPORT PATIENT NA12878 FORMAT CRAM
```

## Single Query Mode

For one-off queries from a shell script, pass the query as a string argument:

```bash
seashell "FIND PATIENTS WHERE gene=BRCA1 AND significance=pathogenic"
seashell "COUNT VARIANTS WHERE patient=NA12878"
seashell --format json "LIST PATIENTS"
```

## Commands

| Command | Description |
|---|---|
### Variant queries
| Command | Description |
|---|---|
| `LIST PATIENTS` | List all patients in your institution |
| `LIST GENES` | List all gene symbols |
| `LIST ANNOTATIONS` | Loaded reference DB versions (gnomAD / dbSNP / constraint) |
| `COUNT VARIANTS WHERE patient=NA12878` | Count a patient's variants (sub-millisecond) |
| `COUNT PATIENTS WHERE gene=BRCA1` | Count patients matching criteria |
| `FIND VARIANTS WHERE patient=NA12878 AND gene=BRCA1` | Variants in a gene for one patient |
| `FIND PATIENTS WHERE gene=BRCA1 AND significance=pathogenic` | Carriers of a variant |
| `FIND SIMILAR TO patient=NA12878` | Genetically similar patients |
| `COMPARE PATIENTS NA12878 AND HG00096` | Jaccard similarity between two patients |
| `DIFF PATIENTS NA12878 AND HG00096` | Exact variant differences |
| `PCA PATIENTS` | Principal component analysis |

### Annotation queries (new in 0.1.7)
Filter variants by population frequency, predicted consequence, dbSNP membership, and gene constraint. Every variant is silently joined against gnomAD v4.1 + dbSNP + gnomAD constraint at query time. Sub-millisecond on the canonical thresholds.

| Command | Description |
|---|---|
| `COUNT VARIANTS WHERE patient=NA12878 AND gnomad_af<0.001` | Rare variants (sub-millisecond fast path) |
| `COUNT VARIANTS WHERE patient=NA12878 AND gnomad_af<0.0001` | Ultra-rare variants (sub-millisecond fast path) |
| `COUNT VARIANTS WHERE patient=NA12878 AND lof=true` | Loss-of-function variants (sub-millisecond fast path) |
| `COUNT VARIANTS WHERE patient=NA12878 AND consequence=missense_variant` | Missense variants (sub-millisecond fast path) |
| `COUNT VARIANTS WHERE patient=NA12878 AND novel=true` | Variants not in dbSNP |
| `FIND VARIANTS WHERE patient=NA12878 AND gnomad_af<0.001 AND consequence=missense_variant` | Rare missense — the canonical rare-disease query |
| `FIND VARIANTS WHERE patient=NA12878 AND lof=true AND loeuf<0.35` | High-impact LoF in constrained genes |
| `FIND VARIANTS WHERE patient=NA12878 AND rsid=rs334` | Lookup by dbSNP rsID |

**Filter keys:** `gnomad_af`, `gnomad_popmax`, `consequence`, `lof`, `impact`, `rsid`, `novel`, `pli`, `loeuf`. All numeric filters support `< > <= >= = !=`. Result rows include `gene`, `consequence`, `hgvs_c`, `hgvs_p`, `transcript`, `gnomad_af`, `gnomad_popmax`, `rsid`, `pli`, and `loeuf` when annotations are loaded. See https://seashell.bio/docs (Developer → Annotation queries) for full reference.

### Sequencing QC
| Command | Description |
|---|---|
| `COVERAGE PATIENT id REGION chr:start-end` | Per-base read depth for a region: mean, min/max, and % above 10x/20x/30x |
| `QC PATIENT id` | Combined read-stats summary (mapped/unmapped/duplicates/properly paired, mean MAPQ, mean insert size) |
| `FLAGSTAT PATIENT id` | Read-flag summary report; output format matches `samtools flagstat` |
| `INSERT_SIZE PATIENT id` | Insert-size distribution: pair count, mean, std, median, mode, MAD, percentiles |
| `CYCLE_QUALITY PATIENT id` | Per-cycle base-quality decay across the read length |
| `PILEUP PATIENT id POSITION chr:pos` | Per-base pileup at one position |

### Family genetics & cohort QC
| Command | Description |
|---|---|
| `SEXCHECK PATIENT id` | Infer biological sex (XX / XY / XXY / X0 / XYY) from chrX and chrY normalized coverage |
| `KINSHIP COHORT [UNEXPECTED] [LIMIT N]` | Pairwise relatedness pre-screen across the cohort |
| `KINSHIP TRIO mom dad child` | Validate a declared family trio with sample-swap warnings |
| `CONTAMINATION PATIENT id` | Sample-swap and contamination pre-screen (practical LoD ~3-5%) |
| `ANCESTRY PATIENT id` | Predict super-population (EUR/EAS/SAS/AFR/AMR) against the bundled 1000 Genomes panel |
| `MENDELIAN TRIO mom dad child` | De novo + inheritance partition for a declared trio |

### Data management
| Command | Description |
|---|---|
| `UPLOAD PATIENT id CRAM s3://path.cram VCF s3://path.vcf.gz` | Upload pre-aligned CRAM/BAM |
| `UPLOAD PATIENT id FASTQ s3://R1.fastq.gz s3://R2.fastq.gz` | Upload raw FASTQ |
| `UPLOAD BATCH s3://manifest.json` | Batch upload from manifest |
| `EXPORT PATIENT id FORMAT CRAM` | Export as CRAM (or BAM) |
| `DELETE PATIENT id` | Remove a patient (admin only) |
| `help` | Show all commands |

## Requirements

- Python 3.8+
- A Seashell API key (contact your institution admin)

## Documentation

https://seashell.bio/docs
