Metadata-Version: 2.0
Name: wildpath
Version: 0.2.0
Summary: easy data structure access utility
Home-page: UNKNOWN
Author: Lars van Gemerden
Author-email: gemerden@gmail.com
License: MIT License
Keywords: access data structure getter setter deleter iterator utility tool
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Requires-Dist: boolean.py

.. figure:: https://travis-ci.org/gemerden/wildpath.svg?branch=master
   :alt: Travis

   Travis

WildPath
========

A path abstraction to access items in composite (e.g. JSON) objects in
python.

Introduction
------------

This module is intended primarily as a practical tool to access data in
complex data structures. Especially accessing multiple items usually
requires for-loops or other constructs and there is no straightforward
way to pass nested locations as single parameters. This module solves
this problem by introdicing 2 classes:

-  ``Path`` allows to get, set and delete single items in the data
   structure; it is optimized for speed,
-  ``WildPath`` does the same and allows wildcards and boolean logic
   (and, or, not) in paths to get, set and delete to multiple items in
   one call,
-  Both have iterators (in the common baseclass) to run through all
   paths and values in a data structure.

As an typical example we take the JSON response of a call to
``maps.googleapis.com`` for the route between 2 addresses. The response
is over 390 lines of JSON if nicely formatted. However we will only be
interested in the geo\_locations of the individual steps (turn-by-turn
instructions) of the route.

In normal code this would look something like (with ``json_route`` the
result from the call to the google API):

.. code:: python

    def get_geo_locations(json_route):
        geo_locs = []
        for json_step in json_route["routes"][0]["legs"][0]["steps"]:  #  there is only 1 route and 1 leg in the response
            geo_locs.append({"start_location": json_step["start_location"],
                             "end_location": json_step["end_location"]})
        return geo_locs

    geo_locations = get_geo_locations(json_route)

Using ``WildPath`` the same result is obtained by:

.. code:: python

    location_path = WildPath("routes.0.legs.0.steps.*.*_location")

    geo_locations = location_path.get_in(json_route)  # note the path.get_in(obj) syntax versus the usual obj.get(key)

Both produce the same list of items:

.. code:: python

    [
        {
            "start_location": {
                "lat": 52.0800134,
                "lng": 4.3271703
            },
            "end_location": {
                "lat": 52.0805958,
                "lng": 4.3286669
            }
        }, 
        ...
    ]

Essentially the function definition is replaced by a string, using
``WildPath.get_in`` for the correct lookup logic. This has some
advantages:

-  Less lines of code means lower likelyhood of bugs,
-  Better readability and maintainablity (once you get used to the
   path-notation),
-  A ``Path`` or ``WildPath`` is easily serializable
   (``str(Path("a.b.c")) == "a.b.c"``), where a function definition is
   not.

Prerequisites
-------------

The module can be installed with ``pip install wildpath``. It is tested
for both python 2.7 and python 3.3 to 3.6.

Functionality
-------------

The **``Path``** class supports, with e.g. ``path = Path("a.0.b")`` and
``obj = {"a": [{"b": 1}]}``:

-  ``get_in``: getting items from data structures: ``path.get_in(obj)``,
-  ``set_in``: setting values in data structures:
   ``path.set_in(obj, value)``,
-  ``del_in``: deleting items from data structures:
   ``path.del_in(obj)``,
-  ``has_in``: checking whether a value exists at path:
   ``path.has_in(obj)``,
-  ``pop_in``: deleting and returning items from data structures:
   ``path.pop_in(obj)``.

It also has some iterators that run through all paths and values in a
data structure:

-  ``Path.items(obj)``: iterator over all ``(path, value)`` tuples in
   the object,
-  ``Path.paths(obj)``: iterator over all paths in the object,
-  ``Path.values(obj)``: iterator over all values in the object.

The **``WildPath``** class supports the same functionality as ``Path``,
but with the following additions:

-  Keys referring to mappings (e.g. ``dict``) or python class objects
   can contain wildcards: ``WildPath("*.a*.b?")``, with ``*`` for any
   string and ``?`` for any single character. Wildcards use the standard
   python ``fnmatch.fnmatchcase``,
-  Keys referring to sequences (e.g. ``list``, ``tuple``) can contain
   slices: ``WildPath("1:3.::2")``, with ``:`` from standard python
   slice notation ``some_list[start:stop:step]``,
-  All keys can contain boolean logic, using ``&`` for AND, ``|`` for OR
   and ``!`` for NOT: ``WildPath("a*&!*b")``: keys starting with ``'a'``
   and not ending with ``'b'``.

Note that:

-  The iterator methods of ``WildPath`` return paths of type
   ``WildPath``, instead of ``Path``,
-  If a key or index is not found in the data, a ``KeyError`` or
   ``IndexError`` will be raised,
-  ``get_in`` can take a ``default`` parameter, that is returned if no
   value exists at the path location: ``path.get_in(obj, None)``,
-  Using wildpaths will return instances of the classes in the original
   object for mappings and sequences. For (other) python objects it will
   return a ``dict``. For example ``WildPath(":2").get_in((1, 2, 3))``
   will return ``(1, 2)``.

Examples
--------

Starting with this example structure of an agenda item in some tool:

.. code:: python

    agenda = {
        "meeting": "progress on project X",
        "date": "2017-8-14",
        "start_time": "10:00",
        "end_time": "11:00",
        "invited": ["Joe", "Ann", "Boo"],
        "items": [
            {
                "name": "opening",
                "duration": "5 minutes",
                "subjects": ["purpose of the meeting"],
            },
            {
                "name": "progress",
                "duration": "25 minutes",
                "subjects": ["milestones", "project delays", "actions"],
            },
            {
                "name": "closing",
                "duration": "5 minutes",
                "subjects": ["questions", "roundup"],
            },
        ]
    }

class ``Path``
~~~~~~~~~~~~~~

The 'Path' class let you get, set or delete items at a specific
location:

.. code:: python

    from wildpath.paths import Path

    path = Path("items.0.duration")
    assert str(path) == "items.0.duration"  # str(..) returns the original path string

    duration = path.get_in(agenda)  # retrieves value at path location
    assert duration == "5 minutes"

    path.set_in(agenda, "10 minutes")  # sets value at path location
    assert path.get_in(agenda) == "10 minutes"

    path.del_in(agenda)  # deletes key-value at path loation
    assert path.has_in(agenda) == False  # has_in checks the presence of a value at the path location

class ``WildPath``
~~~~~~~~~~~~~~~~~~

``WildPath`` supports the same API as ``Path``, but additionally lets
you use wildcards and slicing in the path definition to access multiple
items in the structure (the ``Path`` class is there because for single
lookups it is substantially faster):

.. code:: python

    from wildpath.paths import WildPath

    wildpath = WildPath("items.*.duration")  # basic 'star' notation

    durations = wildpath.get_in(agenda)  # retrieves all the durations of the items on the agenda
    assert durations == ["5 minutes", "25 minutes", "5 minutes"]

    wildpath.set_in(agenda, ["10 minutes", "50 minutes", "10 minutes"])  # setting all the values, 
    assert wildpath.get_in(agenda) == ["10 minutes", "50 minutes", "10 minutes"]

    wildpath.set_in(agenda, "30 minutes")  #  or replacing all with a single value, 
    assert wildpath.get_in(agenda) == ["30 minutes", "30 minutes", "30 minutes"]

    wildpath.del_in(agenda)  # delete all the items at wildpath from the structure
    assert wildpath.has_in(agenda) == False  # `has_in` checks if all the items at wildpath are there

To get the start and end time of the meeting:

.. code:: python

    wildpath = WildPath("*_time")
    assert wildpath.get_in(agenda) == {"start_time": "10:00", "end_time": "11:00"}

Similarly it supports slices as wildcard like path-elements

.. code:: python

    wildpath = WildPath("items.0:2.name")
    assert wildpath.get_in(agenda) == ["opening", "progress"]

    wildpath = WildPath("items.!0:2.name")  # slices can be negated
    assert wildpath.get_in(agenda) == [ "closing"]

    wildpath = WildPath("items.-1::-1.name")  # extended slicing also works, but orders are not reversed for a negative step parameter
    assert wildpath.get_in(agenda) == ["opening", "progress", "closing"]

WildPath supports a boolean logic:

.. code:: python

    # '|' is the OR operator

    assert WildPath("start_time|end_time").get_in(agenda) == {"start_time": "10:00", "end_time": "11:00"}

    # '&' is the AND operator

    assert WildPath("start_*&*_time").get_in(agenda) == {"start_time": "10:00"}


    # '!' is the NOT operator:

    assert WildPath("!item?").get_in({"item1": "chair", "item2": "table", "count": 2}) == {"count": 2}

    # parentheses can be used to indicate precedence:

    assert WildPath("!(a|b)") != WildPath("!a|b")

**Notes**:

-  WildPath also supports attribute lookup in nested objects, list
   attributes in objects, etc.,
-  All the examples of ``WildPath.get_in`` also work for ``set_in``,
   ``del_in``, ``pop_in`` and ``has_in``,
-  In ``wildpath.set_in(obj, value)``, value can either be a single
   value (which will be used to set all target values), or a data
   structure with the same 'shape' as the result of
   ``wildpath.get_in(obj)``.

Iterators
~~~~~~~~~

The Path classes also have some iterator classmethods defined:

.. code:: python

    from wildpath.paths import Path

    for path, value in Path.items(agenda):
        print(" ".join([str(path), ":", value]))

prints

.. code:: text

    date : 2017-8-14
    end_time : 11:00
    invited.0 : Joe
    invited.1 : Ann
    invited.2 : Boo
    items.0.duration : 5 minutes
    items.0.name : opening
    items.0.subjects.0 : purpose of the meeting
    items.1.duration : 25 minutes
    items.1.name : progress

    etc...

To create an alternative representation of the datastructure:

.. code:: python

    D = {str(path): value for path, value in Path.items(agenda)}

Path.items() has an optional argument ``all`` that if set to ``True``
will iterate over all path, value combination, including intermediary
paths:

.. code:: python

    from wildpath.paths import Path

    for path, value in Path.items(agenda, all=True):
        print(" ".join([str(path), ":", value]))

will print:

.. code:: text

    date : 2017-8-14
    end_time : 11:00
    invited : ['Joe', 'Ann', 'Boo']
    invited.0 : Joe
    invited.1 : Ann
    invited.2 : Boo
    items : [{'duration': '5 minutes', 'subjects': ['purpose of the meeting'], ...]
    items.0 : {'duration': '5 minutes', 'subjects': ['purpose of the meeting'], 'name': 'opening'}
    items.0.duration : 5 minutes
    items.0.name : opening
    items.0.subjects : ['purpose of the meeting']
    items.0.subjects.0 : purpose of the meeting

    etc...

With the ``Path.items(obj, all=True)`` and the ordering the items are
produced, more manipulations are possible, e.g.:

.. code:: python

    from datetime import datetime
    from wildpath.paths import Path

    sample = {
        "name": "sample",
        "times": [datetime(1999,1,2,3), datetime(1999,1,2,4)]
    }

    new_sample = {}
    for path, value in Path.items(sample, all=True):
        if isinstance(value, datetime):
            value = str(value)  # all values of type datetime are converted to strings
        path.set_in(new_sample, value)

    # new_sample is now serializable to JSON

**Notes**:

-  Currently these iterators cannot handle circular relationships. This
   will result in a RuntimeError (recursion depth) ,
-  To iterate over attributes in objects, callables and attributes
   starting en ending with "\_\_" are excluded,
-  The iterators return generators, not lists or dicts. To do this, use
   ``list(Path.items(obj))``, ``dict(Path.items(obj))``,
-  These iterators can also be useful the get an alternative view on a
   datastructure: a starting point to define WildPaths,
-  To turn the items into a ``dict`` with string keys, use
   ``dct = {str(p): v for p, v in Path.items(obj)}``.

Path manipulations
~~~~~~~~~~~~~~~~~~

``Path`` and ``WildPath`` are subclasses of tuple (via BasePath), so
(almost) all tuple methods can be used with both, e.g.:

.. code:: python

    from wildpath.paths import Path

    assert Path("a.b") + Path("c") == Path("a.b.c")
    assert Path("a.b.c")[1:] == Path("b.c")
    assert repr(Path("a.b.c")) == "('a', 'b', 'c')"

    # however, tuple.__str__ is overridden to return the input string for the class constructor for easy (de)serialization:

    assert str(Path("a.b.c")) == "a.b.c"

Note that some methods (like ``__add__`` and ``path[1:]``) are
overridden to return the correct class (Path or WildPath)

Limitations
-----------

Because of the characters used to parse the paths, some keys in the
target datastructures will cause the system to fail:

-  for ``Path`` and ``WildPath``: keys in Mappings (e.g. dict,
   OrderedDict) cannot contain a ``.``,
-  for ``WildPath``: keys in Mappings cannot contain the characters
   ``*``, ``?``, ``!``, ``|`` and ``&``, or to be precise, if they are
   present, they cannot be used in paths for lookups,
-  note that the ``.`` separator can easily be replaced in a subclass,
   allowing paths like ``"a/b/3/x"`` instead of ``"a.b.3.x"`` (and
   therefore path ``"a/b.c/3/x"`` with ``b.c`` a dictionary key):

.. code:: python

    from wildpath.paths import Path, WildPath

    class SlashPath(Path):
        sep = '/'

    class WildSlashPath(WildPath):
        sep = '/'

Overriding ``!``, ``|`` and ``&`` will take a little more work: override
class-attribute ``tokens`` in ``WildPath`` and override
``KeyParser.DEFAULT_TOKENS``. Currently there is no way to override hte
tokens ``*`` and ``?`` in ``WildPath``.

Testing
-------

The unittests are standard python unittests and can be run as such.

Authors
-------

Lars van Gemerden (rational-it) - initial code and documentation.

License
-------

This project is licensed under the license in LICENSE.txt.

Acknowledgments
---------------

-  For convincing me to open-source this module, a big thanks to Jasper
   Hartong,
-  For the creators of the module ``boolean.py``, thanks for making
   boolean parsing a lot easier.


