Metadata-Version: 2.4
Name: data-quality-tests
Version: 2.1
Summary: Data Quality Check Library
Home-page: https://github.com/beekiran00/Data-Quality
Author: Bhanu Venkata Kiran Velpula
Author-email: beekiran00@gmail.com
License: MIT
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: numpy
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: requires-dist
Dynamic: summary

## DATA QUALITY

A library which acts as a test cases for dataframes. Simply pass in your dataframe after initial import, or at each stage of your EDA to check for data quality with one line of code.

The test cases include(as of now)
1. check for null values
2. check for duplicates
3. check for dtype matching
4. check for outliers - depricated
5. check for whitespaces in column headers - depricated

The test cases work as a Pass/Fail type, where Passed indicates, good data quality and Failed indicates bad data quality

Example: 

TEST CASE FOR NULL VALUES: Passed means that the dataframe has no null values. Failed indicates otherwise.

## Requirements

* Python 3+
* Pandas
* Numpy


## Installation

```python
pip install data-quality-tests
```

## Updates & Changes

1. the import function changed from:

```python
from data_quality import DataQuality
```

to the following:

```python
from data_quality_tests import DataQuality
```

2. new function ```get_row_count()``` has been added in this update, which displays number of rows in a dataframe.  
*For use case, refer to the get started section*

3. ```data_quality_check``` now checks for column header whitespaces for leading and trailing.


## Get Started

How to use this library:

### Data quality check

The most basic usage of this library, here for simplifiction,  
let's just se the iris dataset from seaborn library.

You can use any dataset.

```python
from data_quality_tests import DataQuality as dq
import seaborn as sns

#declare any dataframe

df = sns.load_dataset("iris")

#pass the dataframe as below  

dq.data_quality_check(df)

dq.get_row_count(df)
```
