Metadata-Version: 1.1
Name: Twper
Version: 0.1.4
Summary: An asynchronous twitter scraper
Home-page: https://github.com/jungerm2/Twper
Author: Sacha Jungerman
Author-email: jungerm2@illinois.edu
License: UNKNOWN
Description: Twper - an asynchronous twitter scraper
        =======================================
        
        Twitter provides a powerful REST and Streaming API. With the REST API,
        you can only send about 720 requests per hour and only get tweets that
        are less than a week old. This project tries to overcome these
        limitations by scraping the website instead. This directly translates
        to: \* No rate limits, you can easily scrape 50 thousand tweets in an
        hour (without even using threads!) \* No authentication is needed (the
        request header is random so you should'nt get blacklisted) \* And most
        importantly, you are free to query tweets that are more than 7 days old
        
        This project is similar in nature to `taspinar's
        twitterscraper <https://github.com/taspinar/twitterscraper>`__ and was
        heavily inspired by it. In fact, a small portion of the code was
        shamelessly borrowed.
        
        The main difference between our two libraries is that this one is fully
        asynchronous so requests are non-blocking. This allows multiple requests
        to be processed in a shorter period of time, making the scraper much
        faster.
        
        Getting Started
        ---------------
        
        Unfortunately, at the moment this package is only for python 3.6+ as it
        relies on newer features.
        
        Installing
        ~~~~~~~~~~
        
        To install this package simply run
        
        ::
        
            (sudo) pip install Twper
        
        Or you can clone the repository and in the folder containing setup.py
        run
        
        ::
        
            python setup.py install
        
        Examples
        ~~~~~~~~
        
        Each tweet is represented as a Tweet object and contains the following
        attributes: \* user - the sender's username \* fullname - the sender's
        full name \* tweet\_id - a unique id (provided by twitter) \* url - a
        url to that specific tweet \* timestamp - a datetime object of when the
        tweet was sent \* text - the tweet's message \* replies - number of
        replies the tweet got \* retweets - number of retweets it got \* likes -
        number of likes the tweet received \* hashtags - what hashtags are
        mentioned in the tweet
        
        *Note:* Tweet has a from\_id constructor that returns a tweet object
        from a tweet\_id.
        
        *Warning:* This feature uses requests and is blocking.
        
        To get additional info about a specific user you can use the
        TwitterAccount class. Specifically TwitterAccount.from\_username(user)
        can be used. A TwitterAccount has the following attributes: \* user -
        the sender's username \* fullname - the sender's full name \* tweets -
        number of tweets the user tweeted \* following - number of people the
        user is following \* followers - number of people following the user \*
        likes - number of likes issued by the user \* lists - number of lists
        issued by the user \* moments - number of moments the user has \* bio -
        the user's biography (short description) \* location - the user's
        geographical location \* location\_id - the corresponding location id
        twitter uses \* website - the user's website \* birthday - datetime of
        the user's birthday if publicly available \* joined - datetime of when
        the user joined
        
        *Note:* Some of the above info might be missing/not publicly available.
        In this case the default value for dates is None, from strings it's an
        empty string and for ints its zero.
        
        *Warning:* The from\_username feature uses requests and is blocking.
        
        A search is described by a query string (q\_str) and these have the
        following properties:
        
        +----------------------+---------------------------------------------+
        | q\_str               | What it will query for                      |
        +======================+=============================================+
        | A B C                | tweets containing A and B and C             |
        +----------------------+---------------------------------------------+
        | "ABC"                | tweets containing the exact match ABC       |
        +----------------------+---------------------------------------------+
        | A OR C               | tweets containing either A or C             |
        +----------------------+---------------------------------------------+
        | -A -B                | tweets NOT containing A and NOT B           |
        +----------------------+---------------------------------------------+
        | #ABC                 | tweets containing the hashtag #ABC          |
        +----------------------+---------------------------------------------+
        | from:A               | tweets that are from account A              |
        +----------------------+---------------------------------------------+
        | to:B                 | tweets that are to account B                |
        +----------------------+---------------------------------------------+
        | @C                   | tweets that mention account C               |
        +----------------------+---------------------------------------------+
        | since:2017-12-01     | tweets after date                           |
        +----------------------+---------------------------------------------+
        | until:2017-12-02     | tweets before date                          |
        +----------------------+---------------------------------------------+
        | place:LOCATION\_ID   | tweets from location with id LOCATION\_ID   |
        +----------------------+---------------------------------------------+
        
        *Note:* Ordering does not matter, and a search is case-insensitive
        except for keywords OR, from:, to: since:, until: and place: which
        should be written exactly as above. Also there should NOT be a space
        between the colon and search value (ex: from: A is wrong and will search
        for tweets containing 'from:' and 'A' instead of the intended behavior).
        
        In this package there's two classes used to search tweets (Query,
        Queries). Both these classes have a get\_tweets method which returns an
        async generator. And therefore they need to be ran in an event loop. Do
        not worry if you haven't used these before, let's jump right into it!
        
        ::
        
            async def main():
                # Example 1: A simple search using Query
                q = Query('Some Query Goes Here', limit=20)
                async for tw in q.get_tweets():
                    # Process data
                    print(tw)
        
        
            # This actually runs the main function
            loop = asyncio.get_event_loop()
            try:
                loop.run_until_complete(main())
                loop.run_until_complete(loop.shutdown_asyncgens())
            finally:
                loop.close()
        
        This will print the most recent 20 tweets (from newest to oldest)
        containing the words 'some' and 'query' and 'goes' and 'here'.
        
        Sometimes, the q\_str you want to query for is too long and it needs to
        be broken up into smaller pieces and ORed together. For this you can use
        the Queries class. The Queries class executes many different queries
        together and then joins them. In the following example four separate
        queries are executed at once and tweets are printed reverse
        chronological order (newest first). Please note that in the following
        Queries example is not faster than running those four queries
        sequentially, rather it merges them chronologically.
        
        ::
        
            async def main():
                # Example 2: Multiple searches using Queries.
                qs = Queries(['Some', 'Query', 'Goes', 'Here'], limit=5)
                async for tw in qs.get_tweets():
                    # Process data
                    print(tw)
        
        
            # This actually runs the main function
            loop = asyncio.get_event_loop()
            try:
                loop.run_until_complete(main())
                loop.run_until_complete(loop.shutdown_asyncgens())
            finally:
                loop.close()
        
        The limit key word argument simply limits the maximum number of results
        any generator can yield. In the second example the limit is applied to
        every query individually so the maximum number of tweets it can yield is
        5 x 4 = 20.
        
        For further question I encourage you to look at the source code as it is
        not long and well commented before asking.
        
        Contributing
        ------------
        
        This is my first open source project, so please feel free to contribute
        in any way and/or point out what I should improve (as well as any bugs
        of course). Pull requests and issues are welcomed.
        
        Todo
        ----
        
        If you are looking to contribute or just curious about what I plan to
        add/fix here is the todo list:
        
        -  Remove the requests dependency. This is a blocking library that
           should be replaced by aiohttp. It is only used in Tweet.from\_id and
           TwitterAccount.from\_username and therefore doesn't affect the
           performance of Querying.
        
        -  Improve the TwitterAccount class. Hopefully it's possible to scrape
           what accounts a user is following and what accounts are following the
           user if we add authentication. Currently we can only retrieve stats
           about a user account.
        
        -  Possibly add support for other languages. Currently, only english is
           fully supported even though you can set language to something other
           than 'en' in the Query constructor. Setting it to None searches
           everything regardless of the language.
        
        Authors
        -------
        
        -  Sacha Jungerman - Initial Work -
           `Twper <https://github.com/jungerm2/Twper>`__
        
        License
        -------
        
        This project is licensed under the MIT License - see the
        `LICENSE <LICENSE>`__ for details
        
        Acknowledgments
        ---------------
        
        -  Credit's to `Taspinar <http://www.ataspinar.com>`__ for his great
           library that inspired the creation of this one.
        
Keywords: twitter scraper async asynchronous fast non-blocking asyncio
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Software Development
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.6
