Metadata-Version: 1.1
Name: alertscraper
Version: 0.1.6
Summary: Flexible tool for scraping for certain certain DOM elements, and then emailing if new ones are added.
Home-page: https://github.com/michaelpb/alertscraper
Author: michaelb
Author-email: michaelpb@gmail.com
License: GPL3
Description: alertscraper
        ============
        
        .. figure:: https://badge.fury.io/py/alertscraper.png
           :alt: alertscraper badge
        
           alertscraper badge
        
        .. figure:: https://travis-ci.org/michaelpb/alertscraper.png?branch=master
           :alt: travis badge
        
           travis badge
        
        General purpose flexible tool for scraping a given URL for a certain
        type of items, and then email if new items are added. Useful for
        monitoring ad or auction websites. Could also be useful for setting up
        email alerts on your own site.
        
        WARNING
        =======
        
        -  Check the Terms of Service of the site before you use this tool! For
           some sites, using this tool may violate their terms of service, and
           should not be used.
        
        Limitations
        ===========
        
        -  This code ONLY scrapes based on the initial HTTP request. Websites
           that function as single-page apps will not work. This could be
           supported in the future using JSON, or integrating with something
           heavier weight like Selenium.
        
        Usage
        =====
        
        Installation
        ------------
        
        Assuming Python's ``pip`` is installed (for Debian-based systems, this
        can be installed with ``sudo apt-get install python-pip``), alertscraper
        can be installed directly from PyPI:
        
        ::
        
            pip install alertscraper
        
        Python versions 3.3+ (and 2.6+) are supported and tested against.
        
        Quick start
        -----------
        
        ``alertscraper`` is based on URLs, and maintains a history file for each
        URL that you scrape so it knows when something is new.
        
        Start by navigating in your web-browser to the website you want to
        scrape, and then copying and pasting the URL. Then, inspect the page
        source of the site and see if you can figure out the DOM path to the
        relevant element. In this case, it was a ``li`` element with the class
        name ``result`` so the combined thing becomes ``li.result``.
        
        ::
        
            alertscraper 'https://some-site.org/?query=guitar&maxprice=550' li.result
        
        This will download the given URL and list the text content of each item
        specified. This lets you know your query is correct.
        
        Now we want to save this to a database file, that is, say that "I've
        seen everything currently posted and am only now interested in new
        stuff".
        
        ::
        
            alertscraper 'https://some-site.org/?query=guitar&maxprice=550' li.result --file=guitars.txt
        
        Notice that it prints out again all the links it found. If we were to
        run the command again, it would not print them out since it will have
        stored them as "already seen".
        
        Finally, lets run the command to email us everything that has not yet
        been seen.
        
        ::
        
            alertscraper 'https://some-site.org/?query=guitar&maxprice=550' li.result --file=guitars.txt --email=myemail@gmail.com
        
        This only runs once. If you want it to run continually, I'd recommend
        putting it in a cronjob. Eventually I may add a daemon mode, but this is
        good for now.
        
        Happy scraping!
        
        Contributing
        ============
        
        -  `CONDUCT.md <CONDUCT.md>`__
        
        New features, tests, and bug fixes are welcome!
        
        
        
        
        
Keywords: alertscraper
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: Environment :: Console
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
