Metadata-Version: 2.1
Name: do-data-utils
Version: 1.1.1
Summary: Functionalities to interact with Google and Azure, and clean data
Home-page: https://github.com/anuponwa/do-data-utils
Author: Anupong Wannakrairot
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Information Technology
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: databricks-sdk==0.36.0
Requires-Dist: databricks-sql-connector==3.6.0
Requires-Dist: google==3.0.0
Requires-Dist: google-api-core==2.21.0
Requires-Dist: google-auth==2.35.0
Requires-Dist: google-cloud==0.34.0
Requires-Dist: google-cloud-bigquery==3.26.0
Requires-Dist: google-cloud-core==2.4.1
Requires-Dist: google-cloud-secret-manager==2.21.0
Requires-Dist: google-cloud-storage==2.18.2
Requires-Dist: google-crc32c==1.6.0
Requires-Dist: pandas
Requires-Dist: openpyxl==3.1.5
Requires-Dist: XlsxWriter==3.2.0

# datautils

This package provides you the functionalities to connect to different cloud sources and data cleaning functions.

## Installation

### Commands

To install the latest version from `main` branch, use the following command:
```bash
pip install "git+https://github.com/anuponwa/datautils.git"
```
You can install a specific version like so:
```bash
pip install "git+https://github.com/anuponwa/datautils.git@<version>"
```
For example,
```bash
pip install "git+https://github.com/anuponwa/datautils.git@1.1.0"
```

Extra options can be inspected in `setup.py` in the `extras_require` option.

### Install in requirements.txt

You can also put this source in the `requirements.txt`.
```python
# requirements.txt
git+https://github.com/anuponwa/datautils.git@1.1.0
```

## Available Subpackages
- `google` – Utilities for Google Cloud Platform.
- `azure` – Utilities for Azure services.

For a full list of functions, see the [overview documentation](docs/overview.md).


## Example Usage

### Google

## GCS

```python
from datautils.google import get_secret, gcs_to_file


# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
    secret_info = json.load(f)

secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='gcs-secret-id-dev')

# Download the content from GCS
gcspath = 'gs://my-ai-bucket/my-path-to-json.json'
f = gcs_to_file(gcspath, secret=secret)
my_dict = json.load(f)
```

## GBQ

```python
from datautils.google import get_secret, gbq_to_df


# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
    secret_info = json.load(f)

secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='gbq-secret-id-dev')

# Query
query = 'select * from my-project.my-dataset.my-table'
df = gbq_to_df(query, secret, polars=False)
```

### Azure/Databricks

```python
from datautils.azure import databricks_to_df


# Load secret key and get the secret to access GCS
with open('secrets/secret-manager-key.json', 'r') as f:
    secret_info = json.load(f)

secret = get_secret(secret_info, project_id='my-secret-project-id', secret_id='databricks-secret-id-dev')

# Download from Databricks sql
query = 'select * from datadev.dsplayground.my_table'
df = databricks_to_df(query, secret, polars=False)
```
