Metadata-Version: 2.1
Name: hip-data-ml-utils
Version: 1.0.3
Summary: Common Python tools and utilities for Hipages ML work
License: MIT
Author: Hipages Data Team
Author-email: datascience@hipagesgroup.com.au
Requires-Python: >=3.8,<3.11
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Dist: PyYAML (==6.0)
Requires-Dist: aiobotocore (==2.4.2)
Requires-Dist: appdirs (==1.4.4)
Requires-Dist: attrs (==22.1.0)
Requires-Dist: black (>=22.6.0,<23.0.0)
Requires-Dist: boto3 (==1.24.59)
Requires-Dist: botocore (==1.27.59)
Requires-Dist: certifi (==2022.12.7)
Requires-Dist: cfgv (==3.2.0)
Requires-Dist: coverage (==5.4)
Requires-Dist: distlib (==0.3.6)
Requires-Dist: filelock (==3.6.0)
Requires-Dist: flake8 (>=4.0.1,<5.0.0)
Requires-Dist: identify (==1.5.13)
Requires-Dist: iniconfig (==1.1.1)
Requires-Dist: isort (>=5.10.1,<6.0.0)
Requires-Dist: joblib (==1.2.0)
Requires-Dist: mccabe (==0.6.1)
Requires-Dist: mlflow (==2.3.1)
Requires-Dist: mock (>=4.0.3,<5.0.0)
Requires-Dist: moto (>=3.1.5,<4.0.0)
Requires-Dist: mypy-extensions (==0.4.3)
Requires-Dist: nodeenv (==1.5.0)
Requires-Dist: numpy (==1.22.4)
Requires-Dist: packaging (>=23.1,<24.0)
Requires-Dist: pandas (==1.4.4)
Requires-Dist: pluggy (==1.0.0)
Requires-Dist: polling (==0.3.2)
Requires-Dist: pre-commit (==2.10.0)
Requires-Dist: py (==1.10.0)
Requires-Dist: pyarrow (==8.0.0)
Requires-Dist: pyathena (>=2.17.0,<3.0.0)
Requires-Dist: pydantic (>=1.9.0,<2.0.0)
Requires-Dist: pyparsing (==2.4.7)
Requires-Dist: pytest (>=7.1.1,<8.0.0)
Requires-Dist: pytest-cov (>=3.0.0,<4.0.0)
Requires-Dist: pytest-custom-exit-code (==0.3.0)
Requires-Dist: regex (==2022.7.9)
Requires-Dist: requests (==2.28.2)
Requires-Dist: responses (==0.23.1)
Requires-Dist: s3fs (>=2023.3.0,<2024.0.0)
Requires-Dist: six (==1.15.0)
Requires-Dist: toml (==0.10.2)
Requires-Dist: typed-ast (>=1.5.3,<2.0.0)
Requires-Dist: typing-extensions (>=4.2.0,<5.0.0)
Description-Content-Type: text/markdown

# data-ml-utils
A utility python package that covers the common libraries we use.

## Installation
This is an open source library hosted on pypi. Run the following command to install the library
```
pip install hip-data-ml-utils --upgrade
```

## Documentation
Head over to https://hip-data-ml-utils.readthedocs.io/en/latest/index.html# to read our library documentation

## Feature
### Pyathena client initialisation
Almost one liner
```python
import os
from hip_data_ml_utils.pyathena_client.client import PyAthenaClient

os.environ["AWS_ACCESS_KEY_ID"] = "xxx"
os.environ["AWS_SECRET_ACCESS_KEY"] = "xxx" # pragma: allowlist secret
os.environ["S3_BUCKET"] = "xxx"

pyathena_client = PyAthenaClient()
```
![Pyathena client initialisation](docs/_static/initialise_pyathena_client.png)

### Pyathena query
Almost one liner
```python
query = """
    SELECT
        *
    FROM
        dev.example_pyathena_client_table
    LIMIT 10
"""

df_raw = pyathena_client.query_as_pandas(final_query=query)
```
![Pyathena query](docs/_static/query_pyathena_client.png)

### MLflow utils
Visit [link](https://data-ml-utils.readthedocs.io/en/latest/index.html#mlflow-utils)

### More to Come
* You suggest, raise a feature request issue and we will review!

## Tutorials
### Pyathena
There is a jupyter notebook to show how to use the package utility package for `pyathena`: [notebook](tutorials/[TUTO]%20pyathena.ipynb)

### MLflow utils
There is a jupyter notebook to show how to use the package utility package for `mlflow_databricks`: [notebook](tutorials/[TUTO]%20mlflow_databricks.ipynb)

