Metadata-Version: 1.1
Name: pyensembl
Version: 1.5.2
Summary: Python interface to ensembl reference genome metadata
Home-page: https://github.com/openvax/pyensembl
Author: Alex Rubinsteyn
Author-email: alex.rubinsteyn@mssm.edu
License: http://www.apache.org/licenses/LICENSE-2.0.html
Description: PyEnsembl
        =========
        
        PyEnsembl is a Python interface to `Ensembl <http://www.ensembl.org>`__
        reference genome metadata such as exons and transcripts. PyEnsembl
        downloads `GTF <https://en.wikipedia.org/wiki/Gene_transfer_format>`__
        and `FASTA <https://en.wikipedia.org/wiki/FASTA_format>`__ files from
        the `Ensembl FTP server <ftp://ftp.ensembl.org>`__ and loads them into a
        local database. PyEnsembl can also work with custom reference data
        specified using user-supplied GTF and FASTA files.
        
        Example Usage
        =============
        
        .. code:: python
        
           from pyensembl import EnsemblRelease
        
           # release 77 uses human reference genome GRCh38
           data = EnsemblRelease(77)
        
           # will return ['HLA-A']
           gene_names = data.gene_names_at_locus(contig=6, position=29945884)
        
           # get all exons associated with HLA-A
           exon_ids  = data.exon_ids_of_gene_name('HLA-A')
        
        Installation
        ============
        
        You can install PyEnsembl using
        `pip <https://pip.pypa.io/en/latest/quickstart.html>`__:
        
        .. code:: sh
        
           pip install pyensembl
        
        This should also install any required packages, such as
        `datacache <https://github.com/openvax/datacache>`__ and
        `BioPython <http://biopython.org/>`__.
        
        Before using PyEnsembl, run the following command to download and
        install Ensembl data:
        
        ::
        
           pyensembl install --release <list of Ensembl release numbers> --species <species-name>
        
        For example, ``pyensembl install --release 75 76 --species human`` will
        download and install all human reference data from Ensembl releases 75
        and 76.
        
        Alternatively, you can create the ``EnsemblRelease`` object from inside
        a Python process and call ``ensembl_object.download()`` followed by
        ``ensembl_object.index()``.
        
        Cache Location
        --------------
        
        By default, PyEnsembl uses the platform-specific ``Cache`` folder and
        caches the files into the ``pyensembl`` sub-directory. You can override
        this default by setting the environment key ``PYENSEMBL_CACHE_DIR`` as
        your preferred location for caching:
        
        .. code:: sh
        
           export PYENSEMBL_CACHE_DIR=/custom/cache/dir
        
        or
        
        .. code:: python
        
           import os
        
           os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir'
           # ... PyEnsembl API usage
        
        Non-Ensembl Data
        ================
        
        PyEnsembl also allows arbitrary genomes via the specification of local
        file paths or remote URLs to both Ensembl and non-Ensembl GTF and FASTA
        files. (Warning: GTF formats can vary, and handling of non-Ensembl data
        is still very much in development.)
        
        For example:
        
        .. code:: python
        
           data = Genome
               reference_name='GRCh38',
               annotation_name='my_genome_features',
               gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf'))
           # parse GTF and construct database of genomic features
           data.index()
           gene_names = data.gene_names_at_locus(contig=6, position=29945884)
        
        API
        ===
        
        The ``EnsemblRelease`` object has methods to let you access all possible
        combinations of the annotation features *gene_name*, *gene_id*,
        *transcript_name*, *transcript_id*, *exon_id* as well as the location of
        these genomic elements (contig, start position, end position, strand).
        
        Genes
        -----
        
        .. raw:: html
        
           <dl>
        
        .. raw:: html
        
           <dt>
        
        genes(contig=None, strand=None)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns a list of Gene objects, optionally restricted to a particular
        contig or strand.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        genes_at_locus(contig, position, end=None, strand=None)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns a list of Gene objects overlapping a particular position on a
        contig, optionally extend into a range with the end parameter and
        restrict to forward or backward strand by passing strand=‘+’ or
        strand=‘-’.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        gene_by_id(gene_id)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Return a Gene object for given Ensembl gene ID (e.g. “ENSG00000068793”).
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        gene_names(contig=None, strand=None)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns all gene names in the annotation database, optionally restricted
        to a particular contig or strand.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        genes_by_name(gene_name)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Get all the unqiue genes with the given name (there might be multiple
        due to copies in the genome), return a list containing a Gene object for
        each distinct ID.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        gene_by_protein_id(protein_id)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Find Gene associated with the given Ensembl protein ID (e.g.
        “ENSP00000350283”)
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        gene_names_at_locus(contig, position, end=None, strand=None)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Names of genes overlapping with the given locus, optionally restricted
        by strand. (returns a list to account for overlapping genes)
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        gene_name_of_gene_id(gene_id)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns name of gene with given genen ID.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        gene_name_of_transcript_id(transcript_id)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns name of gene associated with given transcript ID.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        gene_name_of_transcript_name(transcript_name)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns name of gene associated with given transcript name.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        gene_name_of_exon_id(exon_id)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns name of gene associated with given exon ID.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        gene_ids(contig=None, strand=None)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Return all gene IDs in the annotation database, optionally restricted by
        chromosome name or strand.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        gene_ids_of_gene_name(gene_name)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns all Ensembl gene IDs with the given name.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           </dl>
        
        Transcripts
        -----------
        
        .. raw:: html
        
           <dl>
        
        .. raw:: html
        
           <dt>
        
        transcripts(contig=None, strand=None)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns a list of Transcript objects for all transcript entries in the
        Ensembl database, optionally restricted to a particular contig or
        strand.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        transcript_by_id(transcript_id)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Construct a Transcript object for given Ensembl transcript ID (e.g.
        “ENST00000369985”)
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        transcripts_by_name(transcript_name)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns a list of Transcript objects for every transcript matching the
        given name.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        transcript_names(contig=None, strand=None)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns all transcript names in the annotation database.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        transcript_ids(contig=None, strand=None)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns all transcript IDs in the annotation database.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        transcript_ids_of_gene_id(gene_id)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Return IDs of all transcripts associated with given gene ID.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        transcript_ids_of_gene_name(gene_name)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Return IDs of all transcripts associated with given gene name.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        transcript_ids_of_transcript_name(transcript_name)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Find all Ensembl transcript IDs with the given name.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        transcript_ids_of_exon_id(exon_id)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Return IDs of all transcripts associatd with given exon ID.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           </dl>
        
        Exons
        -----
        
        .. raw:: html
        
           <dl>
        
        .. raw:: html
        
           <dt>
        
        exon_ids(contig=None, strand=None)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns a list of exons IDs in the annotation database, optionally
        restricted by the given chromosome and strand.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        exon_ids_of_gene_id(gene_id)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns a list of exon IDs associated with a given gene ID.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        exon_ids_of_gene_name(gene_name)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns a list of exon IDs associated with a given gene name.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        exon_ids_of_transcript_id(transcript_id)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns a list of exon IDs associated with a given transcript ID.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           <dt>
        
        exon_ids_of_transcript_name(transcript_name)
        
        .. raw:: html
        
           </dt>
        
        .. raw:: html
        
           <dd>
        
        Returns a list of exon IDs associated with a given transcript name.
        
        .. raw:: html
        
           </dd>
        
        .. raw:: html
        
           </dl>
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
