Metadata-Version: 2.4
Name: pyPINTS
Version: 1.2.0
Summary: Peak Identifier for Nascent Transcripts Starts (PINTS)
Home-page: https://pints.yulab.org
Author: Li Yao
Author-email: regulatorygenome@gmail.com
License: GPL
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.19.2
Requires-Dist: pandas>=1.1.5
Requires-Dist: scipy>=1.5.2
Requires-Dist: pysam>=0.16.0.1
Requires-Dist: requests
Requires-Dist: pybedtools>=0.8.1
Requires-Dist: statsmodels>=0.12.1
Requires-Dist: pyBigWig
Requires-Dist: biopython
Requires-Dist: matplotlib
Dynamic: author
Dynamic: author-email
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: summary

# PINTS: Peak Identifier for Nascent Transcript Starts

![Supported platforms](https://img.shields.io/badge/platform-linux%20%7C%20osx-lightgrey.svg)
![Supported Python versions](https://img.shields.io/badge/python-3.x-blue.svg)
[![PyPI](https://github.com/liyao001/PINTS/actions/workflows/python-publish.yml/badge.svg)](https://github.com/liyao001/PINTS/actions/workflows/python-publish.yml)
[![PINTS web portal](https://img.shields.io/website?label=PINTS%20web%20portal&url=https%3A%2F%2Fpints.yulab.org)](//pints.yulab.org)

## Installation

PINTS is available on PyPI and bioconda, which means you can install PINTS easily with:

```shell
pip install pyPINTS
```

or 

```shell
conda install bioconda::pypints
```

Alternatively, you can clone this repo to a local directory, then run the following command in that directory:

```shell
python setup.py install
```

## Get started

PINTS can call peaks from either bigWig or BAM files. If you have signals for the forward and reverse strands in
two separate bigWig files (`path_to_pl.bw` and `path_to_mn.bw`), you can use command like the following to get the peaks:

```shell
pints_caller --save-to output_dir \
  --file-prefix output_prefix \
  --bw-pl path_to_pl.bw \
  --bw-mn path_to_mn.bw \
  --thread 16
```

To call peaks from BAM files:
you'll need to provide PINTS a path to the BAM file and what kind of experiment it was from.
If it's from a standard protocol, like [PROcap](https://doi.org/10.1038/nprot.2016.086), then you can set `--exp-type PROcap`.
Other supported experiments including [GROcap](https://doi.org/10.7554/eLife.00808)/
[CoPRO](https://doi.org/10.1038/s41588-018-0234-5)/
[csRNAseq](https://doi.org/10.1101/gr.253492.119)/
[NETCAGE](https://doi.org/10.1038/s41588-019-0485-9)/
[CAGE](https://doi.org/10.1038/nmeth0306-211)/
[RAMPAGE](https://doi.org/10.1101/gr.139618.112)/
[STRIPEseq](https://doi.org/10.1101/gr.261545.120). For a comprehensive list of directly supported assays, please run

```shell
pints_caller --help
```

If the data was generated by other methods, you need to tell PINTS where it can find ends of RNAs you are interested in.
For example, `--exp-type R_5` tells the tool that:

1. this alignment is from a single-end library;
2. the tool should look at 5' of reads. Other supported values are `R_3`, `R1_5`, `R1_3`, `R2_5`, `R2_3`.

If reads represent the reverse complement of original RNAs, like PROseq, then you need to use `--reverse-complement`
(not necessary for standard protocols).

One example for calling peaks from BAM file:

```shell
pints_caller --bam-file input.bam \
  --save-to output_dir \
  --file-prefix output_prefix \
  --thread 16 \
  --exp-type PROcap
```

> We have prepared several [case studies](https://pints.yulab.org/tre_calling) demonstrating steps 
from processing the raw fastq files to calling peaks/TREs for your reference. 

## Outputs

* prefix+`_{SID}_divergent_peaks.bed`: Divergent TREs;
* prefix+`_{SID}_bidirectional_peaks.bed`: Bidirectional TREs (divergent + convergent);
* prefix+`_{SID}_unidirectional_peaks.bed`: Unidirectional TREs, maybe lncRNAs transcribed from enhancers (e-lncRNAs) as suggested [here](http://www.nature.com/articles/s41576-019-0184-5).

`{SID}` will be replaced with the number of samples that peaks are called from,
if you only provide PINTS with one sample, then `{SID}` will be replaced with **1**,
if you try to use PINTS with three replicates (`--bam-file A.bam B.bam C.bam`), then `{SID}` for peaks identified from `A.bam` will be replaced with 1.

For divergent or bidirectional TREs, there will be 6 columns in the outputs:

1. Chromosome
2. Start site: 0-based
3. End site: 0-based
4. Confidence about the peak pair. Can be:
   * `Stringent(qval)`, which means the two peaks on both forward and reverse strands are significant based on their *q*-values;
   * `Stringent(pval)`, which means one peak is significant according to *q*-value while the other one is significant according to *p*-value;
   * `Relaxed`, which means only one peak is significant in the pair.
   * A combination of the three types above, because of overlap for nearby elements.
   * If epigenomic annotation is enabled by `--epig-annotation <biosample>`, then peaks that are less significant (`--relaxed-fdr-target`, default is 2*`fdr_target`), but overlap with epigenomic annotations from PINTS web server, will be listed with the confidence level: `Marginal`.
5. Major TSSs on the forward strand, if there are multiple major TSSs, they will be separated by comma `,`
6. Major TSSs on the reverse strand, if there are multiple major TSSs, they will be separated by comma `,`

For unidirectional TREs, there will be 9 columns in the output:

1. Chromosome
2. Start
3. End
4. Peak ID
5. Q-value
6. Strand
7. Read counts
8. Position of the summit TSS
9. Height of the summit

For all three types of TREs, if a valid biosample name for `--epig-annotation` is provided, then an additional column with epigenomic annotation for each TRE will show up in the final output.

## Parameters

### Input & Output

* If you want to use BAM files as inputs:
  * `--bam-file`: input bam file(s);
  * `--exp-type`: Type of experiment. If the experiment is not listed as a choice, or you know the position of RNA ends on the reads and you want to override the defaults, you can specify:
    * `R_5` (5' of the read for single-end lib),
    * `R_3` (3' of the read for single-end lib),
    * `R1_5` (5' of the read1 for paired-end lib),
    * `R1_3` (3' of the read1 for paired-end lib),
    * `R2_5` (5' of the read2 for paired-end lib),
    * or `R2_3` (3' of the read2 for paired-end lib)
  * `--reverse-complement`: Set this switch if 1) `exp-type` is `Rx_x` and 2) reads in this library represent the reverse complement of RNAs, like PROseq;
  * `--ct-bam`: Bam file for input/control (optional);
* If you want to use bigwig files as inputs:
  * `--bw-pl`: Bigwig for signals on the forward strand;
  * `--bw-mn`: Bigwig for signals on the reverse strand;
  * `--ct-bw-pl`: Bigwig for input/control signals on the forward strand (optional);
  * `--ct-bw-mn`: Bigwig for input/control signals on the reverse strand (optional);
* `--save-to`: save peaks to this path (a folder), by default, current folder
* `--file-prefix`: prefix to all outputs

### Optional parameters

* `--dont-merge-reps`: Starting with PINTS 1.2.x, the software automatically merges multiple replicates for a joint peak calling process. To call peaks individually for each sample, as in previous versions, use this option.
* `--epig-annotation <biosample>`: Use this option together with the name of the biosample that the library was derived from, for example K562; then epigenomic annotations will be downloaded from the PINTS web server and used for annotating and augmenting TREs identified by PINTS **(for hg38 only)**;
* `--relaxed-fdr-target <relaxed fdr>`: In the presence of `--epig-annotation`, peaks that do not pass the original FDR cutoff but pass this relaxed cutoff and have support from DNase-seq and H3K27ac ChIP-seq will also be included in final outputs. By default, 2*fdr;
* `--mapq-threshold <min mapq>`: Minimum mapping quality, by default: 30 or `None`;
* `--close-threshold <close distance>`: Distance threshold for two peaks (on opposite strands) to be merged, by default: 300;
* `--fdr-target <fdr>`: FDR target for multiple testing, by default: 0.1;
* `--chromosome-start-with <chromosome prefix>`: Only keep reads mapped to chromosomes with this prefix. By default, all reads will be analyzed;
* `--thread <n thread>`: Max number of threads the tool can create;
* `--borrow-info-reps`: Borrow information from reps to refine calling of divergent elements;
* `--sensitive`: Call peaks in a more sensitive mode (LRT+FC).

More parameters can be seen by running `pints_caller -h`.

## Case Study: Identify Differentially Expressed TREs
In this section, we try to identify differentially expressed TREs (promoters and enhancers) from two conditions.

First, call peaks for each condition with `pints_caller`:
```shell
# control samples
pints_caller --bw-pl DMSO_r1_pl.bw DMSO_r2_pl.bw \ 
  --bw-mn DMSO_r1_mn.bw DMSO_r2_mn.bw \
  --thread 16 --file-prefix DMSO
# and treatment samples
pints_caller --bw-pl E2_r1_pl.bw E2_r2_pl.bw \
  --bw-mn E2_r1_mn.bw E2_r2_mn.bw \
  --thread 16 --file-prefix E2
```

Second, build the counts table with `pints_counter`:
```shell
pints_counter -b DMSO_1_bidirectional_peaks.bed E2_1_bidirectional_peaks.bed \ 
  -u DMSO_1_unidirectional_peaks.bed E2_1_unidirectional_peaks.bed \
  -p DMSO_r1_pl.bw DMSO_r2_pl.bw E2_r1_pl.bw E2_r2_pl.bw \
  -m DMSO_r1_mn.bw DMSO_r2_mn.bw E2_r1_mn.bw E2_r2_mn.bw \
  -c DMSO DMSO E2 E2 \
  -r 1 2 1 2 \
  -s counts.csv
```

The counts table look like the following:
```
,DMSO_1,DMSO_2,E2_1,E2_2
chr1:10609-10620,17,22,44,43
chr1:629905-629938,169,13,224,82
chr1:633956-634096,218,12,271,102
chr1:778554-778929,1180,195,1327,495
chr1:779719-779721,0,0,12,2
chr1:779846-780119,6,0,30,6
chr1:827199-827316,48,22,101,46
chr1:827326-827736,634,88,752,318
chr1:827742-827773,19,0,32,8
```

Third, feed DESeq2/edgeR with the counts table for differential expression analysis

## Additional Tools

* `pints_visualizer`: Generate bigwig files for the inputs.
* `pints_counter`: Generate count matrix for downstream usages (e.g. differential expression analysis).
* `pints_normalizer`: Normalize inputs.
* `pints_boundary_extender`: Extend peaks from summits.

You can use `tool_name --help` to see parameters for each tool.

## Links
* Citation: If you use PINTS in your work, please cite: [https://www.nature.com/articles/s41587-022-01211-7](https://www.nature.com/articles/s41587-022-01211-7).
* Support: Please submit an issue with any questions or if you experience any issues/bugs.
