Metadata-Version: 2.1
Name: bors
Version: 0.3.0
Summary: A highly flexible and extensible service integration framework for scraping the web or consuming APIs
Home-page: https://github.com/RobotStudio/bors
Author: Bobby
Author-email: bobby@robot.studio
License: GPL
Keywords: web-scraper api-integrator scraping scraper data-integration data-ingestion bors
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.6
Requires-Dist: dataclasses
Requires-Dist: marshmallow
Requires-Dist: requests
Requires-Dist: urllib3

Bors
====

A highly flexible and extensible service integration framework for
scraping the web or consuming APIs.

Usage
=====

1. Create your model based on the data you expect to incorporate.
2. Decide on what you want to do with your data, and add it.
3. Create or use an existing API integration library.
4. Create your root application to tie it all together.

Object Model
------------

We use `marshmallow <https://marshmallow.readthedocs.io/en/latest/>`__
for the underlying object schema definitions. Here's an example model:

.. code:: python

    from marshmallow import Schema, fields

    class NewsItemSchema(Schema):
        """News item"""
        id = f.Str(required=True)
        url = f.Str(required=True)
        title = f.Str(required=True)
        pubDate = f.Str(required=True)
        timestamp = f.Str(required=True)
        feed_id = f.Int(required=True)
        published_date = f.Str(required=True)
        feed_name = f.Str(required=True)
        feed_url = f.Str(required=True)
        feed_enabled = f.Int(required=True)
        feed_description = f.Str(required=True)
        url_field = f.Str(required=True)
        title_field = f.Str(required=True)
        date_field = f.Str(required=True)
        feed_image = f.Str(required=True)

See the ``marshmallow`` docs for more information.

Middleware Strategies
---------------------

Middleware API is implemented in the form of strategies and follows this
basic layout:

.. code:: python

    """
    Simple context display strategy
    """

    from bors.app.strategy import IStrategy


    class Print(IStrategy):
        """Print Strategy implementation"""
        def bind(self, context):
            """
            Bind the strategy to the middleware pipeline,
            returning the context
            """
            print(f"""PrintStrategy: {context}""")

            # just a pass-through
            return context

The important things to note here: \* We're inheriting from
``IStrategy``. \* We're implementing a ``bind`` method. \* The bind
method receives, potentially augments, and then returns the ``context``.

API Integration
---------------

Request Schema
~~~~~~~~~~~~~~

Because our API is simple, we're going to use this as-is.

.. code:: python

    from bors.generics.request import RequestSchema

Response Schema
~~~~~~~~~~~~~~~

Our API sends us data in the following format:

.. code:: json

    {
        "data": ...,
        "status": "OK"
    }

For this, we'll need to supplement a bit, removing the root fields and
returning the ``data`` value:

.. code:: python

    from marshmallow import fields
    from bors.generics.request import ResponseSchema


    class MyAPIResponseSchema(ResponseSchema):
        """Schema defining how the API will respond"""
        status = fields.Str()
        def get_result(self, data):
            """Return the actual result data"""
            return data.get("data", "")

        class Meta:
            """Add 'data' field"""
            strict = True
            additional = ("data",)

API Class
~~~~~~~~~

.. code:: python

    from bors.api.requestor import Req


    class MyAPI(LoggerMixin):
        name = "my_api"
        def __init__(self, context):
            self.create_logger()

            self.request_schema = RequestSchema
            self.result_schema = MyAPIResponseSchema
            self.context = context

            self.req = Req("http://some.api.endpoint/v1", payload, self.log)

            # We don't need to deal directly with requests, so we pass them through
            self.call = self.req.call

        def shutdown(self):
            """Perform last-minute stuff"""
            pass

Here we use the built-in ``Req`` class to issue requests to the API, we
assign the ``request_schema`` and ``result_schema`` to classes in our
object, and we set the ``name``, ``context``, and ``call`` attributes.
The results passed through on the API are referencable from within the
middleware context under the key ``my_api``.

Pulling it all together
~~~~~~~~~~~~~~~~~~~~~~~

.. code:: python

    from bors.app.builder import AppBuilder
    from bors.app.strategy import Strategy


    def main():
        strat = Strategy(Print())
        app = AppBuilder([MyAPI], strat)
        app.run()

    if __name__ == "__main__":
        main()

Here, we set as many strategies and API's as we want, then create and
run the ``app``.

Architecture
============

::

      +------------+
    +-+ MIDDLEWARE +------> out
    | +------------+
    |                       API/WEB
    | +------------+
    +-+ PREPROCESS +<------ in
      +------------+

At its most basic level, a ``bors`` integrator engages with an
integration library (API) passing incoming data through a prepocessor to
generate and validate incoming objects, then passes that data through
middlewares. Outgoing interactions are initiated from within a
middleware and passed directly to an API, allowing easily for
request/response type behavior in addition to observe and react.

Ingesting Data
~~~~~~~~~~~~~~

::

          ^
          |
    +-----+------+
    | MIDDLEWARE |
    +-----+------+
          ^
    +-----+------+
    | PREPROCESS |
    +-----+------+
          ^
          |
          +
         API/
         WEB

Ingested data provokes calls along the pipeline.

Outgoing Data
~~~~~~~~~~~~~

::

         API/
         WEB
          ^
          |
    +-----+------+
    | MIDDLEWARE |
    +------------+

Enacted events stimulate API or web actions.

Preprocessing
-------------

Preprocessing is nothing more than an object-ization of the incoming
data. This provides two benefits: 1. Data can be generalized across API
interfaces. 2. Data structure can be validated and enforced.

Middlewares
-----------

Middlewares allow for a data processing pipeline to pass data through.

::

      +-+  +-+  +-+
      |M|  |M|  |M|
      |I|  |I|  |I|
      |D|  |D|  |D|
      |D|  |D|  |D|
    ->+L+->+L+->+L+->
      |E|  |E|  |E|
      |W|  |W|  |W|
      |A|  |A|  |A|
      |R|  |R|  |R|
      |E|  |E|  |E|
      +-+  +-+  +-+

With this model, we gain a lot of flexibility in the behavior of our
integration. Middleware is up to the developer to create, and can be any
of the following:

-  Data post-processing, filtering, aggregation, or augmentation
-  External integrations and interfaces
-  Stimulate an API/web transaction from external actors or time-based
   criteria
-  Hooks and callbacks




History
-------

0.2.0 (2018-05-17)
---------------------

* Initialize the repository, breaking it out from nombot.

0.3.0 (2018-06-25)
---------------------

* Reboot packaging using cookiecutter-pypackage


