Metadata-Version: 2.1
Name: take-resolution
Version: 0.8.0
Summary: This project build pipelines for resolution score for Take BLiP
Home-page: UNKNOWN
Author: Gabriel Salgado and Moises Mendes
Author-email: anaytics.ped@take.net
Maintainer: Take - D&A
Maintainer-email: anaytics.ped@take.net
License: UNKNOWN
Description: # TakeResolution
        _Gabriel Salgado and Moises Mendes_
        
        ## Overview
        
        Here is presented these content:
        
        * [Intro](#intro)
        * [Configure](#configure)
        * [Install](#install)
        * [Test](#test)
        * [Package](#package)
        * [Upload](#upload)
        * [Pipelines](#pipelines)
        * [Run](#run)
        * [Notebooks](#notebooks)
        * [Tips](#tips)
        
        ## Intro
        
        This project proposes to try to answer this: _how much is resolution on this chatbot?_
        
        To discover the solution, analysed data includes bot structure and interactions events.
        These data are obtained from Spark database on Databricks cluster.
        A single run for this project intends to analyse data of a single bot on this database.
        
        There are so far two pipelines: bot flow and bot events.
        
        ### Bot Flow Pipeline
        
        The first step of bot flow pipeline is collect bot data from Spark database.
        Bot data is, at all, a table with bot identity, flow described as JSON and others information.
        Then, defined bot flow is selected on this step.
        
        Second is extract bot flow as graph.
        Used tool here is `networkx` to represent bot flow as a directional graph.
        
        ### Bot Events Pipeline
        
        We begin this pipeline by extracting bot events from a Spark database.
        From the events database, we select the following columns for a specific bot identity and time period:
        
        - **Category**: name given to some tracked point in the bot flow.
        - **Action**: subgroups within **Category**.
        - **Extras**: extra information saved.
        - **ContactIdentity**: user identity.
        - **OwnerIdentity**: bot identity.
        - **StorageDateBR**: datetime when event is saved.
        
        As long as we progress on this project, this description will include more details.
        
        ## Configure
        
        Here are shown recommended practices to configure project on local.
        
        ### Virtual environment
        
        This step can be done with commands or on PyCharm.
        
        #### On commands
        
        It is recommended to use virtual environment:
        ```
        pip install venv
        ```
        
        Create a virtual environment:
        ```
        python -m venv venv
        ```
        
        Enter to virtual environment (Windows):
        ```
        ./venv/Scripts/activate
        ```
        
        Enter to virtual environment (Linux):
        ```
        ./venv/bin/activate
        ```
        
        To exit virtual environment:
        ```
        deactivate
        ```
        
        #### On PyCharm
        
        Open `File/Settings...` or press `Ctrl+Alt+S`.
        This opens settings window.
        
        Open `Project: ResolutionAnalysis/Project Interpreter` on left menu.
        
        Open `Project Interpreter` combobox and click on `Show All...`.
        This opens a window with Python interpreters.
        
        Click on `+` or press `Alt+Insert`.
        This opens a window to create a new Python interpreter.
        
        We will choose default options that create a new virtual environment into project.
        Click on `Ok` button.
        
        Click on `Ok` button again.
        And again.
        
        ### Configuring on PyCharm
        
        If you are using PyCharm its better show PyCharm where is source code on project.
        Right click on `src` folder in `Project` window at left side.
        This opens context menu.
        
        Choose `Mark Directory as/Sources Root` option.
        This marks `src` as source root directory.
        It will appears as blue folder on `Project` navigator.
        
        ## Install
        
        The `take_resolution` package can be installed from PyPI:
        ```
        pip install take_resolution
        ```
        
        Or from `setup.py`, located at `src` folder:
        ```
        cd src
        pip install . -U
        cd ..
        ```
        
        Installing `take_resolution` also installs all required libraries.
        But we can intended to only install dependencies or maybe update our environment if requirements changed.
        
        All dependencies are declared in `src/requirements.txt`.
        Install dependencies can be done on command or on PyCharm.
        
        ### On command
        
        To install dependencies on environment, run:
        ```
        python commands.py install
        ```
        
        ### On PyCharm
        
        After you created virtual environment or on open PyCharm, it will ask if you want to install requirements.
        Choose `Install`.
        
        ## Test
        
        You can test on commands or on PyCharm.
        It is being build.
        
        ### On commands
        
        First enter to virtual environment.
        Then run kedro tests:
        ```
        python commands.py test
        ```
        
        When this feature is built:
        See coverage results at `htmlcov/index.html`.
        
        ### On PyCharm
        
        Click on `Edit Configurations...` beside `Run` icon.
        This opens Run/Debug Configurations window.
        
        Click on `+` or press `Alt+Insert`.
        
        Choose `Python tests/pytest` option.
        
        Fill `Target` field with path to tests folder as `<path to project>/src/tests`.
        
        Click on `Ok` button.
        
        Click on `Run` icon.
        This run the tests.
        
        Open `Terminal` window and run command to generate HTML report:
        ```
        coverage html
        ```
        
        See coverage results at `htmlcov/index.html`.
        
        ## Package
        
        First enter to virtual environment.
        To package this project into `.egg` and `.whell`:
        ```
        python commands.py package
        ```
        
        Generated packages will be in folder `src/dist`.
        Each new package, do not forget to increase version at `src/take_resolution/__init__.py`
        
        ## Upload
        
        To upload build package to PyPI:
        ```
        python commands.py upload
        ```
        
        This upload the latest build version.
        After, package can be downloaded and installed by pip in any place with python and pip:
        ```
        pip install take_resolution
        ```
        
        ## Pipelines
        
        Pipelines are described on a conf file `conf/base/pipelines.json`.
        See an example for content:
        ```json
        {
            "pipeline_1": {
                "nodes": [
                    {
                        "input": [
                            "input.number",
                            "params.a",
                            "params.b"
                        ],
                        "output": "output_1",
                        "function": "my_module.function_1"
                    },
                    {
                        "input": [
                            "output_1",
                            [
                                "params.x1",
                                "params.x2"
                            ],
                            [
                                "params.y1",
                                "params.y2"
                            ]
                        ],
                        "output": "output_2",
                        "function": "my_module.function_2"
                    },
                    {
                        "input": [
                            "output_2"
                        ],
                        "output": "output_3",
                        "function": "my_module.function_3"
                    }
                ],
                "output": {
                    "raw": [
                        "output_1"
                    ],
                    "intermediate": [
                        "output_2"
                    ],
                    "primary": [
                        "output_3"
                    ]
                }
            },
            "pipeline_2": {
                "nodes": [
                    {
                        "input": [
                            "input.number",
                            "params.q"
                        ],
                        "output": "output_4",
                        "function": "my_module.function_4"
                    }
                ],
                "output": {
                    "raw": [
                        "output_4"
                    ]
                }
            }
        }
        ```
        
        ## Run
        
        To run a given pipeline:
        ```python
        import take_resolution as tr
        input = {'number': 12}
        tr.run('pipeline_1', **input)
        ```
        
        Where `'pipeline_1'` is pipeline name and this pipeline is described on `pipelines.json`.
        
        To run all pipelines described on `pipelines.json`:
        ```python
        import take_resolution as tr
        input = {'number': 12}
        tr.run(**input)
        ```
        
        ## Notebooks
        
        Packaging this project is intended to be installed on a specific Databricks cluster.
        This is the cluster where we work with ML experiments using `mlflow`.
        And an experiment is done as example notebooks on `shared`, that is like:
        ```python
        import mlflow as ml
        import take_resolution as tr
        
        
        with ml.start_run():
            # experiment code using our pipelines
            input = {}
            output = tr.run('pipeline_1', **input)
            
            # logging our parameters
            params = tr.load_params()
            ml.log_params(params)
            
            # logging some value on output
            output_3 = output['primary']['output_3']
            ml.log_metric('output_3', output_3)
        ```
        
        ## Tips
        
        In order to maintain the project:
         * Do not remove or change any lines from the `.gitignore` unless you know what are you doing.
         * When developing experiments and production, follow data standard related to suitable layers.
         * When developing experiments, put them into notebooks, following [code policies](https://docs.google.com/document/d/17EeKpq3svANefmZNwWM4uV2FKEnqxCyFYzkI04l4P38).
         * Write notebooks on Databricks and [synchronize](https://docs.databricks.com/notebooks/github-version-control.html) it to this repository into particular sub-folder in folder `notebooks` and commit them.
         * Do not commit any data.
         * Do not commit any log file.
         * Do not commit any credentials or local configuration.
         * Keep all credentials or local configuration in folder `conf/local/`.
         * Do not commit any generated file on testing or building processes.
         * Run test before pull request to make sure that has no bug.
         * Follow git flow practices:
           * Create feature branch for new feature from `dev` branch.
           Work on this branch with commits and pushes.
           Send a pull request to `dev` branch when terminate the work.
           * When terminate a set of features to release, merge `dev` branch to `test` branch.
           Apply several and strict tests to be sure that all are fine.
           On find errors, fix all and apply tests again.
           When all are ok, merge from `test` to `master` increasing release version and uploading to PyPI.
           * If some bug is found on production, `master` branch, create hotfix branch from `master`.
           Correct all errors and apply tests like in `test` branch.
           When all are ok, merge from hotfix branch to `master` and then, merge from `master` to `dev`.
        
Keywords: BLiP,score,resolution
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3.7
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
