Metadata-Version: 2.1
Name: data-patterns
Version: 0.1.14
Summary: Package for generating and evaluating patterns in quantitative reports
Home-page: https://github.com/DeNederlandscheBank/data-patterns
Author: De Nederlandsche Bank
Author-email: ECDB_berichten@dnb.nl
License: MIT/X license
Keywords: data_patterns
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.0, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: xlsxwriter

=============
data-patterns
=============


.. image:: https://img.shields.io/pypi/v/data_patterns.svg
        :target: https://pypi.python.org/pypi/data_patterns
        :alt: Pypi Version
.. image:: https://img.shields.io/travis/DeNederlandscheBank/data-patterns.svg
        :target: https://travis-ci.org/DeNederlandscheBank/data-patterns
        :alt: Build Status
.. image:: https://readthedocs.org/projects/data-patterns/badge/?version=latest
        :target: https://data-patterns.readthedocs.io/en/latest/?badge=latest
        :alt: Documentation Status
.. image:: https://img.shields.io/badge/License-MIT/X-blue.svg
        :target: https://github.com/DeNederlandscheBank/data-patterns/blob/master/LICENSE
        :alt: License

Package for generating and evaluating data-patterns in quantitative reports

* Free software: MIT/X license
* Documentation: https://data-patterns.readthedocs.io.


Features
--------

Here is what the package does:

- Generating and evaluating patterns in structured datasets and exporting to Excel and JSON
- Transforming generated patterns into XBRL validation rules and Pandas code
- Evaluating reporting data with data quality rules published by De Nederlandsche Bank (to be provided)

Quick overview
--------------

To install the package

::

    pip install data_patterns


To introduce the features of the this package define the following Pandas DataFrame::

    df = pd.DataFrame(columns = ['Name',       'Type',             'Assets', 'TV-life', 'TV-nonlife' , 'Own funds', 'Excess'],
                      data   = [['Insurer  1', 'life insurer',     1000,     800,       0,             200,         200],
                                ['Insurer  2', 'non-life insurer', 4000,     0,         3200,          800,         800],
                                ['Insurer  3', 'non-life insurer', 800,      0,         700,           100,         100],
                                ['Insurer  4', 'life insurer',     2500,     1800,      0,             700,         700],
                                ['Insurer  5', 'non-life insurer', 2100,     0,         2200,          200,         200],
                                ['Insurer  6', 'life insurer',     9000,     8800,      0,             200,         200],
                                ['Insurer  7', 'life insurer',     9000,     0,         8800,          200,         200],
                                ['Insurer  8', 'life insurer',     9000,     8800,      0,             200,         200],
                                ['Insurer  9', 'non-life insurer', 9000,     0,         8800,          200,         200],
                                ['Insurer 10', 'non-life insurer', 9000,     0,         8800,          200,         199.99]])
    df.set_index('Name', inplace = True)

Start by defining a PatternMiner::

    miner = data_patterns.PatternMiner(df)

To generate patterns use the find-function of this object::

    df_patterns = miner.find({'name'      : 'equal values',
                              'pattern'   : '=',
                              'parameters': {"min_confidence": 0.5,
                                             "min_support"   : 2}})

The result is a DataFrame with the patterns that were found. The first part of the DataFrame now contains

+----+--------------+---------------------------+----------+-----------+----------+
| id |pattern_id    |pattern_def                |support   |exceptions |confidence|
+====+==============+===========================+==========+===========+==========+
|  0 |equal values  | {Own funds} = {Excess}    |9         |1          |0.9       |
+----+--------------+---------------------------+----------+-----------+----------+


The miner finds one patterns; it states that the 'Own funds'-column is identical to the 'Excess'-column in 9 of the 10 cases (with a confidence of 90 %, there is one case where the equal-pattern does not hold).


To analyze data with the generated set of data-patterns use the analyze function with the dataframe with the data as input::

    df_results = miner.analyze(df)

The result is a DataFrame with the results. If we select ``result_type = False`` then the first part of the output contains

+-----------+--------------+-------------+---------------------------+----------+-----------+----------+---------+---------+
|index      |result_type   |pattern_id   |pattern_def                |support   |exceptions |confidence|P values |Q values |
+-----------+--------------+-------------+---------------------------+----------+-----------+----------+---------+---------+
|Insurer 10 |False         |equal values | {Own funds} = {Excess}    |9         |1          |0.9       |200      |199.99   |
+-----------+--------------+-------------+---------------------------+----------+-----------+----------+---------+---------+

Other patterns you can use are '>', '<', '<=', '>=', '!=', 'sum', and '-->'.

Read the documentation for more features.


=======
History
=======

0.1.0 (2019-10-27)
------------------

* Development release.

0.1.11 (2019-11-6)
------------------

* First release on PyPI.


