Metadata-Version: 1.0
Name: breadability
Version: 0.1.12
Summary: Redone port of Readability API in Python
Home-page: http://docs.bmark.us
Author: Rick Harding
Author-email: rharding@mitechie.com
License: BSD
Description: breadability - another readability Python port
        ===============================================
        I've tried to work with the various forks of some ancient codebase that ported
        `readability`_ to Python. The lack of tests, unused regex's, and commented out
        sections of code in other Python ports just drove me nuts.
        
        I put forth an effort to bring in several of the better forks into one
        codebase, but they've diverged so much that I just can't work with it.
        
        So what's any sane person to do? Re-port it with my own repo, add some tests,
        infrastructure, and try to make this port better. OSS FTW (and yea, NIH FML,
        but oh well I did try)
        
        This is a pretty straight port of the JS here:
        
        - http://code.google.com/p/arc90labs-readability/source/browse/trunk/js/readability.js#82
        
        
        Installation
        -------------
        This does depend on lxml so you'll need some C headers in order to install
        things from pip so that it can compile.
        
        ::
        
            sudo apt-get install libxml2-dev libxslt-dev
            pip install breadability
        
        
        Usage
        ------
        
        cmd line
        ~~~~~~~~~
        
        ::
        
            $ breadability http://wiki.python.org/moin/BeginnersGuide
        
        Options
        ``````````
        
          - b will write out the parsed content to a temp file and open it in a
            browser for viewing.
          - d will write out debug scoring statements to help track why a node was
            chosen as the document and why some nodes were removed from the final
            product.
          - f will override the default behaviour of getting an html fragment (<div>)
            and give you back a full <html> document.
          - v will output in verbose debug mode and help let you know why it parsed
            how it did.
        
        
        Using from Python
        ~~~~~~~~~~~~~~~~~~
        
        ::
        
            from breadability.readable import Article
            doc = Article(html_text, url=url_came_from)
            print doc.readable
        
        
        Work to be done
        ---------------
        Yep, I've got some catching up to do. I don't do pagination, I've got a lot of
        custom tweaks I need to get going, there are some articles that fail to parse.
        I also have more tests to write on a lot of the cleaning helpers, but
        hopefully things are setup in a way that those can/will be added.
        
        Fortunately, I need this library for my tools:
        
        - https://bmark.us
        - http://readable.bmark.us
        
        so I really need this to be an active and improving project.
        
        
        Off the top of my heads todo list:
        
          - Support metadata from parsed article [url, confidence scores, all
            candidates we thought about?]
          - More tests, more thorough tests
          - More sample articles we need to test against in the test_articles
          - Tests that run through and check for regressions of the test_articles
          - Tidy'ing the HTML that comes out, might help with regression tests ^^
          - Multiple page articles
          - Performance tuning, we do a lot of looping and re-drop some nodes that
            should be skipped. We should have a set of regression tests for this so
            that if we implement a change that blows up performance we know it right
            away.
          - More docs for things, but sphinx docs and in code comments to help
            understand wtf we're doing and why. That's the biggest hurdle to some of
            this stuff.
        
        Helping out
        ------------
        If you want to help, shoot me a pull request, an issue report with broken
        urls, etc.
        
        You can ping me on irc, I'm always in the `#bookie` channel in freenode.
        
        
        Important Links
        ----------------
        
        - `Builds`_ are done on `TravisCI`_
        
        
        Inspiration
        ~~~~~~~~~~~~
        
        - `python-readability`_
        - `decruft`_
        - `readability`_
        
        
        
        .. _readability: http://code.google.com/p/arc90labs-readability/
        .. _Builds: http://travis-ci.org/#!/mitechie/breadability
        .. _TravisCI: http://travis-ci.org/
        .. _decruft: https://github.com/dcramer/decruft
        .. _python-readability: https://github.com/buriy/python-readability
        
        
        .. This is your project NEWS file which will contain the release notes.
        .. Example: http://www.python.org/download/releases/2.6/NEWS.txt
        .. The content of this file, along with README.rst, will appear in your
        .. project's PyPI page.
        
        News
        ====
        
        0.1.11
        -------
        
        * Release date: Dec 12th 2012*
        
        * Add argparse to the install requires for python < 2.7
        
        
        
        0.1.10
        -------
        
        * Release date: Sept 13th 2012*
        
        * Updated scoring bonus and penalty with , and " characters.
        
        
        0.1.9
        ------
        
        * Release date: Aug 27nd 2012*
        
        * In case of an issue dealing with candidates we need to act like we didn't
          find any candidates for the article content. #10
        
        
        0.1.8
        ------
        
        * Release date: Aug 27nd 2012*
        
        * Add code/tests for an empty document.
        * Fixes #9 to handle xml parsing issues.
        
        
        
        0.1.7
        ------
        
        * Release date: July 21nd 2012*
        
        * Change the encode 'replace' kwarg into a normal arg for older python
          version.
        
        
        
        0.1.6
        ------
        
        * Release date: June 17th 2012*
        
        * Fix the link removal, add tests and a place to process other bad links.
        
        
        
        0.1.5
        ------
        
        * Release date: June 16th 2012*
        
        * Start to look at removing bad links from content in the conditional cleaning
          state. This was really used for the scripting.com site's garbage.
        
        
        
        0.1.4
        ------
        
        * Release date: June 16th 2012*
        
        * Add a test generation helper breadability_newtest script.
        * Add tests and fixes for the scripting news parse failure.
        
        
        0.1.3
        ------
        
        * Release date: June 15th 2012*
        
        * Add actual testing of full articles for regression tests.
        * Update parser to properly clean after winner doc node is chosen.
        
        
        0.1.2
        ------
        
        * Release date: May 28th 2012*
        
        * Bugfix: #4 issue with logic of the 100char bonus points in scoring
        * Garden with PyLint/PEP8
        * Add a bunch of tests to readable/scoring code.
        
        
        0.1.1
        ------
        
        * Release date: May 11th 2012*
        
        * Fix bugs in scoring to help in getting right content
        * Add concept of -d which shows scoring/decisions on nodes
        * Update command line client to be able to pipe output to other tools
        
        
        0.1.0
        ---
        
        *Release date: May 6th 2012*
        
        * Initial release and upload to PyPi
        
Keywords: readable parsing html content bookie
Platform: UNKNOWN
