Metadata-Version: 2.0
Name: cooler
Version: 0.7.9
Summary: Sparse binary format for genomic interaction matrices
Home-page: https://github.com/mirnylab/cooler
Author: Nezar Abdennur
Author-email: nezar@mit.edu
License: BSD3
Keywords: genomics,bioinformatics,Hi-C,contact,matrix,format,hdf5
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Requires-Dist: biopython
Requires-Dist: click (>=6.6)
Requires-Dist: cytoolz
Requires-Dist: h5py (>=2.5)
Requires-Dist: multiprocess
Requires-Dist: numpy (>=1.9)
Requires-Dist: pandas (>=0.17)
Requires-Dist: pyfaidx
Requires-Dist: pypairix
Requires-Dist: pysam (>0.8)
Requires-Dist: scipy (>=0.16)
Requires-Dist: six
Provides-Extra: docs
Requires-Dist: Sphinx (>=1.1); extra == 'docs'
Requires-Dist: numpydoc (>=0.5); extra == 'docs'

Cooler
======

|Build Status| |Documentation Status| |install with bioconda| |Binder|
|Join the chat at https://gitter.im/mirnylab/cooler| |DOI|

A cool place to store your Hi-C
-------------------------------

Cooler is a support library for a **sparse, compressed, binary**
persistent storage format, called *cool*, used to store genomic
interaction data, such as Hi-C contact matrices.

The *cool* file format is a reference implementation of a genomic matrix
data model using
`HDF5 <https://en.wikipedia.org/wiki/Hierarchical_Data_Format>`__ as the
container format.

The ``cooler`` package aims to provide the following functionality:

-  Build contact matrices at any resolution from a `list of
   contacts <https://github.com/4dn-dcic/pairix>`__.
-  Query a contact matrix.
-  Export and visualize the data.
-  Perform efficient out-of-core operations, such as aggregation and
   contact matrix normalization (a.k.a. balancing).
-  Provide a clean and well-documented Python API to facilitate working
   with potentially larger-than-memory data.

To get started:

-  Read the
   `documentation <http://cooler.readthedocs.org/en/latest/>`__.
-  See the Jupyter Notebook
   `walkthrough <https://github.com/mirnylab/cooler-binder>`__.
-  *cool* files from published Hi-C data sets are available at
   ``ftp://cooler.csail.mit.edu/coolers``.

Related projects:

-  Process Hi-C data with
   `distiller <https://github.com/mirnylab/distiller>`__.
-  Downstream analysis with
   `cooltools <https://github.com/mirnylab/cooltools>`__ (WIP).
-  Visualize your Cooler data with `HiGlass <http://higlass.io>`__!

Installation
~~~~~~~~~~~~

Requirements:

-  Python 2.7/3.4+
-  libhdf5 and Python packages ``numpy``, ``scipy``, ``pandas``,
   ``h5py``. We highly recommend using the ``conda`` package manager to
   install scientific packages like these. To get it, you can either
   install the full `Anaconda <https://www.continuum.io/downloads>`__
   Python distribution or just the standalone
   `conda <http://conda.pydata.org/miniconda.html>`__ package manager.

Install from PyPI using pip.

.. code:: sh

    $ pip install cooler

If you are using ``conda``, you can alternatively install ``cooler``
from the `bioconda <https://bioconda.github.io/index.html>`__ channel.

.. code:: sh

    $ conda install -c conda-forge -c bioconda cooler

See the `docs <http://cooler.readthedocs.org/en/latest/>`__ for more
information.

Command line interface
~~~~~~~~~~~~~~~~~~~~~~

The ``cooler`` package includes command line tools for creating,
querying and manipulating *cool* files.

.. code:: bash

    $ cooler makebins $CHROMSIZES_FILE $BINSIZE > bins.10kb.bed
    $ cooler cload bins.10kb.bed $CONTACTS_FILE out.cool
    $ cooler balance -p 10 out.cool
    $ cooler dump -b -t pixels --header --join -r chr3:10,000,000-12,000,000 -r2 chr17 out.cool | head

::

    chrom1  start1  end1    chrom2  start2  end2    count   balanced
    chr3    10000000        10010000        chr17   0       10000   1       0.810766
    chr3    10000000        10010000        chr17   520000  530000  1       1.2055
    chr3    10000000        10010000        chr17   640000  650000  1       0.587372
    chr3    10000000        10010000        chr17   900000  910000  1       1.02558
    chr3    10000000        10010000        chr17   1030000 1040000 1       0.718195
    chr3    10000000        10010000        chr17   1320000 1330000 1       0.803212
    chr3    10000000        10010000        chr17   1500000 1510000 1       0.925146
    chr3    10000000        10010000        chr17   1750000 1760000 1       0.950326
    chr3    10000000        10010000        chr17   1800000 1810000 1       0.745982

See also:

-  `CLI Reference <http://cooler.readthedocs.io/en/latest/cli.html>`__.
-  Jupyter Notebook
   `walkthrough <https://github.com/mirnylab/cooler-binder/blob/master/cooler_cli.ipynb>`__.

Python API
~~~~~~~~~~

The ``cooler`` library provides a thin wrapper over the excellent
`h5py <http://docs.h5py.org/en/latest/>`__ Python interface to HDF5. It
supports creation of cooler files and the following types of **range
queries** on the data:

-  Tabular selections are retrieved as Pandas DataFrames and Series.
-  Matrix selections are retrieved as NumPy arrays or SciPy sparse
   matrices.
-  Metadata is retrieved as a json-serializable Python dictionary.
-  Range queries can be supplied using either integer bin indexes or
   genomic coordinate intervals.

.. code:: python


    >>> import cooler
    >>> import matplotlib.pyplot as plt
    >>> c = cooler.Cooler('bigDataset.cool')
    >>> resolution = c.info['bin-size']
    >>> mat = c.matrix(balance=True).fetch('chr5:10,000,000-15,000,000')
    >>> plt.matshow(np.log10(mat), cmap='YlOrRd')

.. code:: python

    >>> import multiprocessing as mp
    >>> import h5py
    >>> pool = mp.Pool(8)
    >>> f = h5py.File('bigDataset.cool', 'r')
    >>> weights, stats = cooler.ice.iterative_correction(f, map=pool.map, ignore_diags=3, min_nnz=10)

See also:

-  `API Reference <http://cooler.readthedocs.io/en/latest/api.html>`__.
-  Jupyter Notebook
   `walkthrough <https://github.com/mirnylab/cooler-binder/blob/master/cooler_api.ipynb>`__.

Schema
~~~~~~

The *cool* format implements a simple `data
model <http://cooler.readthedocs.io/en/latest/datamodel.html>`__ that
stores a genomic matrix in a sparse representation, crucial for
developing robust tools for use on increasingly high resolution Hi-C
data sets, including streaming and
`out-of-core <https://en.wikipedia.org/wiki/Out-of-core_algorithm>`__
algorithms.

The data tables in a *cool* file are stored in a **columnar**
representation as HDF5 groups of 1D array datasets of equal length. The
contact matrix itself is stored as a single table containing only the
**nonzero upper triangle** pixels.

Contributing
~~~~~~~~~~~~

`Pull
requests <https://akrabat.com/the-beginners-guide-to-contributing-to-a-github-project/>`__
are welcome. The current requirements for testing are ``nose`` and
``mock``.

For development, clone and install in "editable" (i.e. development) mode
with the ``-e`` option. This way you can also pull changes on the fly.

.. code:: sh

    $ git clone https://github.com/mirnylab/cooler.git
    $ cd cooler
    $ pip install -e .

License
~~~~~~~

BSD (New)

.. |Build Status| image:: https://travis-ci.org/mirnylab/cooler.svg?branch=master
   :target: https://travis-ci.org/mirnylab/cooler
.. |Documentation Status| image:: https://readthedocs.org/projects/cooler/badge/?version=latest
   :target: http://cooler.readthedocs.org/en/latest/
.. |install with bioconda| image:: https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square
   :target: http://bioconda.github.io/recipes/cooler/README.html
.. |Binder| image:: http://mybinder.org/badge.svg
   :target: https://github.com/mirnylab/cooler-binder
.. |Join the chat at https://gitter.im/mirnylab/cooler| image:: https://badges.gitter.im/mirnylab/cooler.svg
   :target: https://gitter.im/mirnylab/cooler?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
.. |DOI| image:: https://zenodo.org/badge/49553222.svg
   :target: https://zenodo.org/badge/latestdoi/49553222


