Metadata-Version: 2.1
Name: cmsip
Version: 0.0.1.7
Summary: UNKNOWN
Home-page: https://github.com/lijinbio/cmsip
Author: Jin Li
Author-email: lijin.abc@gmail.com
License: License :: OSI Approved :: MIT License
Description: # CMSIP
        
        Detecting differential 5hmC regions from CMS-IP sequencing data.
        
        Source URL: [https://github.com/lijinbio/cmsip](https://github.com/lijinbio/cmsip)
        
        ![Workflow of CMSIP.](cmsip_flowchart.png)
        
        ## Installation
        
        ### Dependencies
        
        - bsmap
        
        `bsmap` is a component in the MOABS package. See more at MOABS ([https://github.com/sunnyisgalaxy/moabs](https://github.com/sunnyisgalaxy/moabs)).
        
        - samtools: [http://samtools.sourceforge.net](http://samtools.sourceforge.net)
        
        - bedtools: [https://bedtools.readthedocs.io](https://bedtools.readthedocs.io)
        
        - kentUtils: [https://github.com/ENCODE-DCC/kentUtils](https://github.com/ENCODE-DCC/kentUtils)
        
        ## Example configuration file and description
        
        ```
        sampleinfo:
          - sampleid: TKO2PE1b2
            group: tko
            filenames:
              - TKO2PE1b2_R1.fastq.gz
          - sampleid: TKO2PE2m
            group: tko
            filenames:
              - TKO2PE2b1_R1.fastq.gz
              - TKO2PE2b1_R2.fastq.gz
          - sampleid: WTPE1b2
            group: wt
            filenames:
              - WTPE1b2_R1.fastq.gz
          - sampleid: WTPE2b2
            group: wt
            filenames:
              - WTPE2b2_R1.fastq.gz
        groupinfo:
          group1: tko
          group2: wt
        resultdir: result
        aligninfo:
          reference: /data/jin/resource/genome/fasta/hg38/hg38.fa.gz
          spikein: /data/jin/resource/genome/fasta/mm10/mm10.fa.gz
          fastqdir: test_data
          statfile: qcstats.txt
          barplotinfo:
            outfile: qcstats_twsn_barplot.pdf
            height: 5
            width: 5
          numthreads: 20
          verbose: True
        genomescaninfo:
          readextension: True
          fragsize: 100
          windowfile: result/hg38_w200.bed
          referencename: hg38
          windowsize: 200
          readscount: False
          counttablefile: counttable.txt.gz
          verbose: True
        dhmrinfo:
          method: 4
          mindepth: 5
          testfile: test.txt.gz
          qthr: 1.05
          maxdistance: 0
          dhmrfile: dhmr.txt.gz
          numthreads: 20
          verbose: True
        ```
        
        ### `sampleinfo`
        
        This block stores detailed metadata information of samples.
        
        ### `groupinfo`
        
        This block lists the interested comparison.  The alternative hypothesis is true difference in means of `group1` and `group2` is less than 0.
        
        ### `aligninfo`
        
        Options and data information required for alignment.
        
        - reference
        
        The FASTA file for the reference genome, such as hg38.fa.gz.
        
        - spikein
        
        The FASTA file for the spike-in genome, such as mm10.fa.gz.
        
        - windowfile: hg38_w100.bed
        
        The genome in window bins. This window bin file can be generated by using bedtools. E.g.
        
        ```
        bedtools makewindows -g <(fetchChromSizes hg38) -w 100 > hg38_w100.bed
        ```
        
        - windowsize: 100
        
        Window size for creating bins.
        
        - fastqdir: test_data
        
        Root directory with raw FASTQ files.
        
        -  outdir
        
        Root output directory for temporary and final result files.
        
        - statfile
        
        QC statistics file. Default is at outdir/qcstats.txt. If this file exists, QC step will be skipped, and size factors will be parsed for the existing QC statistical file. Otherwise, QC step will run to generate the statistics file.
        
        - cnttablefile
        
        Region count table file. Default is at outdir/meancovtable.txt.gz. If this file exists, counting step will be skipped, and the existing count table file will be used for downstream statistical testing. Otherwise, counting step will execute to generate the count table file.
        
        - ttestfile
        
        The statistical testing result file. Default is at outdir/t.test.txt. If this file exists, no more task will run. Otherwise, statistical testing will run on the count table using t-test.
        
        
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.6
Description-Content-Type: text/markdown
