Metadata-Version: 2.2
Name: eazyml-data-quality
Version: 0.0.32
Summary: eazyml-data-quality from EazyML family for comprehensive data quality assessment, including bias detection, outlier identification, and data drift analysis.
Home-page: https://eazyml.com/
Author: EazyML
Author-email: admin@ipsoftlabs.com
Project-URL: Documentation, https://docs.eazyml.com/
Project-URL: Homepage, https://eazyml.com/
Project-URL: Contact Us, https://eazyml.com/trust-in-ai
Project-URL: eazyml-automl, https://pypi.org/project/eazyml-automl/
Project-URL: eazyml-counterfactual, https://pypi.org/project/eazyml-counterfactual/
Project-URL: eazyml-xai, https://pypi.org/project/eazyml-xai/
Project-URL: eazyml-xai-image, https://pypi.org/project/eazyml-xai-image/
Project-URL: eazyml-insight, https://pypi.org/project/eazyml-insight/
Project-URL: eazyml-data-quality, https://pypi.org/project/eazyml-data-quality/
Keywords: data-quality,bias-detection,outlier-detection,data-drift,model-drift,missing-values,correlation-analysis,data-imputation,data-balance,data-quality-tests,ml-api
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: Other/Proprietary License
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: System Administrators
Classifier: Intended Audience :: Information Technology
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: openpyxl
Requires-Dist: flask
Requires-Dist: cryptography
Requires-Dist: PyYAML
Requires-Dist: requests
Requires-Dist: eazyml-insight
Requires-Dist: pandas==1.3.*; python_version <= "3.7"
Requires-Dist: scikit-learn==1.0.*; python_version <= "3.7"
Requires-Dist: numpy==1.21.*; python_version <= "3.7"
Requires-Dist: pandas>=2.0.3; python_version == "3.8"
Requires-Dist: scikit-learn==1.3.*; python_version == "3.8"
Requires-Dist: numpy==1.24.*; python_version == "3.8"
Requires-Dist: pandas>=2.2.3; python_version == "3.9"
Requires-Dist: scikit-learn==1.3.*; python_version == "3.9"
Requires-Dist: numpy==1.24.*; python_version == "3.9"
Requires-Dist: pandas>=2.2.3; python_version == "3.10"
Requires-Dist: scikit-learn==1.3.*; python_version == "3.10"
Requires-Dist: numpy==1.24.*; python_version == "3.10"
Requires-Dist: pandas>=2.2.3; python_version == "3.11"
Requires-Dist: scikit-learn==1.3.*; python_version == "3.11"
Requires-Dist: numpy==1.24.*; python_version == "3.11"
Requires-Dist: pandas>=2.2.3; python_version > "3.11"
Requires-Dist: scikit-learn==1.3.*; python_version > "3.11"
Requires-Dist: numpy; python_version > "3.11"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

## EazyML Responsible-AI: Data Quality Assessment
![Python](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)  ![PyPI package](https://img.shields.io/badge/pypi%20package-0.0.32-brightgreen) ![Code Style](https://img.shields.io/badge/code%20style-black-black)

![EazyML](https://github.com/EazyML/eazyml-docs/raw/refs/heads/master/EazyML_logo.png)

## Overview
`eazyml-data-quality` is a python utility designed to evaluate the quality of datasets by performing various checks such as data shape, emptiness, outlier detection, balance, and correlation. It helps users identify potential issues in their datasets and provides detailed feedback to ensure data readiness for downstream processes.
It offers APIs for data quality assessment across multiple dimensions, including:

## Features
- **Missing Value Analysis**: Detect and impute missing values.
- **Bias Detection**: Uncover and mitigate bias in datasets.
- **Data Drift and Model Drift Analysis**: Monitor changes in data distributions over time.
- **Data Shape Quality**: Validates dataset dimensions and checks if the number of rows is sufficient relative to the number of columns.
- **Data Emptiness Check**: Identifies and reports missing values in the dataset.
- **Outlier Detection**: Detects and removes outliers based on statistical analysis.
- **Data Balance Check**: Analyzes the balance of the dataset and computes a balance score.
- **Correlation Analysis**: Identify multicollinearity, relationships between features and provides alerts for highly correlated features.
- **Summary Alerts**: Consolidates key quality issues into a single summary for quick review.
With `eazyml-data-quality`, you can ensure that your training data is clean, balanced, and ready for machine learning.

## Installation
To use the Data Quality Checker, ensure you have Python installed on your system.
### User installation
The easiest way to install data quality is using pip:
```bash
pip install -U eazyml-data-quality
```
### Dependencies
This package requires:
- pandas,
- scikit-learn,
- numpy,
- openpyxl,
- eazyml-insight

## Usage
Here's an example of how you can use the APIs from this package.
```python
from eazyml_data_quality import ez_init, ez_data_quality

# initialize: setup book-keeping, access_key if required 
_ = ez_init()

# Perform data quality checks
response = ez_data_quality(
                train_data = 'train.csv',
                outcome = 'target',
                options = {
                    "data_shape": "yes",
                    "data_balance": "yes",
                    "data_emptiness": "yes",
                    "impute": "yes",
                    "data_outliers": "yes",
                    "remove_outliers": "yes",
                    "outcome_correlation": "yes",
                    "data_drift": "yes",
                    "model_drift": "yes",
                    "test_data": 'test.csv',
                    "data_completeness": "yes",
                    "data_correctness": "yes",
            }
        )

# the response object contains a dictionary with the results of all data quality checks, along with the data quality alerts selected by the user.
```
You can find more information in the [documentation](https://eazyml.readthedocs.io/en/latest/packages/eazyml_dq.html).


## Useful links, other packages from EazyML family
- [Documentation](https://docs.eazyml.com)
- [Homepage](https://eazyml.com)
- If you have questions or would like to discuss a use case, please contact us [here](https://eazyml.com/trust-in-ai)
- Here are the other packages from EazyML suite:

    - [eazyml-automl](https://pypi.org/project/eazyml-automl/): eazyml-automl provides a suite of APIs for training, optimizing and validating machine learning models with built-in AutoML capabilities, hyperparameter tuning, and cross-validation.
    - [eazyml-data-quality](https://pypi.org/project/eazyml-data-quality/): eazyml-data-quality provides APIs for comprehensive data quality assessment, including bias detection, outlier identification, and drift analysis for both data and models.
    - [eazyml-counterfactual](https://pypi.org/project/eazyml-counterfactual/): eazyml-counterfactual provides APIs for optimal prescriptive analytics, counterfactual explanations, and actionable insights to optimize predictive outcomes to align with your objectives.
    - [eazyml-insight](https://pypi.org/project/eazyml-insight/): eazyml-insight provides APIs to discover patterns, generate insights, and mine rules from your datasets.
    - [eazyml-xai](https://pypi.org/project/eazyml-xai/): eazyml-xai provides APIs for explainable AI (XAI), offering human-readable explanations, feature importance, and predictive reasoning.
    - [eazyml-xai-image](https://pypi.org/project/eazyml-xai-image/): eazyml-xai-image provides APIs for image explainable AI (XAI).

## License
This project is licensed under the [Proprietary License](https://github.com/EazyML/eazyml-docs/blob/master/LICENSE).

---

Maintained by [EazyML](https://eazyml.com)  
Â© 2025 EazyML. All rights reserved.
