Metadata-Version: 1.1
Name: refextract
Version: 0.2.0
Summary: Small library for extracting references used in scholarly communication.
Home-page: https://github.com/inspirehep/refextract
Author: CERN
Author-email: admin@inspirehep.net
License: GPLv2
Description: ..
           This file is part of refextract
           Copyright (C) 2015, 2016 CERN.
        
           refextract is free software; you can redistribute it and/or
           modify it under the terms of the GNU General Public License as
           published by the Free Software Foundation; either version 2 of the
           License, or (at your option) any later version.
        
           refextract is distributed in the hope that it will be useful, but
           WITHOUT ANY WARRANTY; without even the implied warranty of
           MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
           General Public License for more details.
        
           You should have received a copy of the GNU General Public License
           along with refextract; if not, write to the Free Software Foundation, Inc.,
           59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
        
           In applying this license, CERN does not waive the privileges and immunities
           granted to it by virtue of its status as an Intergovernmental Organization
           or submit itself to any jurisdiction.
        
        
        ============
        refextract
        ============
        
        
        Small library for extracting references used in scholarly communication.
        
        * Free software: GPLv2
        * Documentation: http://pythonhosted.org/refextract/
        * Issues and pull requests: https://github.com/inspirehep/refextract
        
        *Originally exported from Invenio https://github.com/inveniosoftware/invenio.*
        
        
        Dependencies
        ============
        * [file](http://linux.die.net/man/1/file)
        * [pdftotext](http://linux.die.net/man/1/pdftotext)
        
        Installation
        ============
        
        .. code-block:: shell
        
            pip install refextract
        
        Usage
        =====
        
        To get structured info from a publication reference:
        
        .. code-block:: python
        
            from refextract import extract_journal_reference
            reference = extract_journal_reference("J.Phys.,A39,13445")
            print(reference)
            {
                'extra_ibids': [],
                'is_ibid': False,
                'misc_txt': u'',
                'page': u'13445',
                'title': u'J. Phys.',
                'type': 'JOURNAL',
                'volume': u'A39',
                'year': ''
             }
        
        
        To extract references from a publication full-text PDF:
        
        .. code-block:: python
        
            from refextract import extract_references_from_file
            reference = extract_references_from_file("some/fulltext/1503.07589v1.pdf")
            print(reference)
            [
                    {'author': [u'F. Englert and R. Brout'],
                     'doi': [u'10.1103/PhysRevLett.13.321'],
                     'journal_page': [u'321'],
                     'journal_reference': ['Phys.Rev.Lett.,13,1964'],
                     'journal_title': [u'Phys.Rev.Lett.'],
                     'journal_volume': [u'13'],
                     'journal_year': [u'1964'],
                     'linemarker': [u'1'],
                     'title': [u'Broken symmetry and the mass of gauge vector mesons'],
                     'year': [u'1964']}, ...
            ]
        
        You can also extract directly from a URL:
        
        .. code-block:: python
        
            from refextract import extract_references_from_url
            reference = extract_references_from_url("http://arxiv.org/pdf/1503.07589v1.pdf")
            print(reference)
            [
                     {'author': [u'F. Englert and R. Brout'],
                      'doi': [u'10.1103/PhysRevLett.13.321'],
                      'journal_page': [u'321'],
                      'journal_reference': ['Phys.Rev.Lett.,13,1964'],
                      'journal_title': [u'Phys.Rev.Lett.'],
                      'journal_volume': [u'13'],
                      'journal_year': [u'1964'],
                      'linemarker': [u'1'],
                      'title': [u'Broken symmetry and the mass of gauge vector mesons'],
                      'year': [u'1964']}, ...
            ]
        
        
        ..
           This file is part of refextract
           Copyright (C) 2015, 2016, 2017 CERN.
        
           refextract is free software; you can redistribute it and/or
           modify it under the terms of the GNU General Public License as
           published by the Free Software Foundation; either version 2 of the
           License, or (at your option) any later version.
        
           refextract is distributed in the hope that it will be useful, but
           WITHOUT ANY WARRANTY; without even the implied warranty of
           MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
           General Public License for more details.
        
           You should have received a copy of the GNU General Public License
           along with refextract; if not, write to the Free Software Foundation, Inc.,
           59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
        
           In applying this license, CERN does not waive the privileges and immunities
           granted to it by virtue of its status as an Intergovernmental Organization
           or submit itself to any jurisdiction.
        
        
        Changes
        =======
        
        Version 0.2.0 (2017-06-26)
        
        - Substantial rewrite of the API. In particular:
        
          * ``extract_references_from_file``, ``extract_references_from_string``, and
            ``extract_references_from_url`` now return a list of the references,
            instead of an object with keys ``stats`` and ``references``.
        
          * If the number of TeXkeys that were extracted from the PDF metadata matches
            the number of references parsed by RefExtract, an extra ``texkey`` field is
            added to each returned reference.
        
          * The API now raises exceptions when it encounters an unrecoverable error.
        
          * Finally, the API now returns the list of raw references on which
            ``refextract`` worked.
        
        Version 0.1.0 (2016-01-12)
        
        - Initial export from Invenio Software <https://github.com/inveniosoftware/invenio>
        - Restructured into stripped down, standalone version
        
Keywords: bibliographic references extraction text-mining
Platform: any
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.7
