Metadata-Version: 2.1
Name: stacks-data
Version: 2.1.1
Summary: A suite of utilities to support data engineering workloads within an Ensono Stacks data platform.
Home-page: https://github.com/Ensono/stacks-data
Author: Ensono Stacks
Author-email: stacks@ensono.com
Requires-Python: >=3.9,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Provides-Extra: behave
Provides-Extra: cli
Provides-Extra: data-quality
Requires-Dist: azure-identity (>=1.17.1,<2.0.0)
Requires-Dist: azure-mgmt-datafactory (>=3.1.0,<4.0.0)
Requires-Dist: azure-storage-file-datalake (>=12.16.0,<13.0.0)
Requires-Dist: behave (>=1.2.6,<2.0.0) ; extra == "behave"
Requires-Dist: click (>=8.1.7,<9.0.0) ; extra == "cli"
Requires-Dist: click-loglevel (>=0.4.0.post1,<0.5.0) ; extra == "cli"
Requires-Dist: colorlog (>=6.8.2,<7.0.0)
Requires-Dist: delta-spark (>=2.4.0,<4.0.0)
Requires-Dist: great-expectations (>=0.17.23,<0.18.0) ; extra == "data-quality"
Requires-Dist: jinja2 (>=3.1.4,<4.0.0) ; extra == "cli"
Requires-Dist: jsonschema (>=4.23.0,<5.0.0)
Requires-Dist: polling2 (>=0.5.0,<0.6.0) ; extra == "behave"
Requires-Dist: pydantic (>=1.10.18,<2.0.0)
Requires-Dist: pyspark (>=3.4.3,<4.0.0)
Requires-Dist: pyyaml (>=6.0.2,<7.0.0)
Description-Content-Type: text/markdown

# Stacks Data

**stacks-data** is a Python package built to support various functions within the Ensono Stacks Data Platform solution. The library and its associated Python-based CLI (`datastacks`) is intended to assist developers working within a deployed Stacks Data Platform, supporting common tasks such as generating new data engineering workloads and running Spark jobs.

* [Stacks Azure Data Platform - GitHub](https://github.com/Ensono/stacks-azure-data)
* [Stacks Azure Data Platform - Documentation](https://stacks.ensono.com/docs/workloads/azure/data/intro_data_azure)
* [Datastacks CLI - Documentation](https://stacks.ensono.com/docs/workloads/azure/data/data_engineering/datastacks)

## Installation

stacks-data is modular, allowing you to install only what you need, keeping the installation lightweight and efficient. By default, stacks-data installs only core functionality, focussed on Pyspark and Azure operations.

The following features require additional dependencies, which can be optionally included in your installation:

* **behave**: Utilities for executing behaviour-driven development (BDD) tests.
* **cli**: The [datastacks](https://stacks.ensono.com/docs/workloads/azure/data/data_engineering/datastacks) command line tool, to support developers generating data workloads.
* **data-quality**: Utilities for running data quality checks using the Great Expectations framework.

You can install the stacks-data package using pip - see the examples below:

```sh
# Example 1: Install only the core stacks-data package
pip install stacks-data

# Example 2: Install the stacks-data package with data quality features included
pip install stacks-data[data-quality]

# Example 3: Install the stacks-data package with all optional features included
pip install stacks-data[behave,cli,data-quality]
```

