Metadata-Version: 2.4
Name: nullpy
Version: 0.0.2
Summary: Automated data cleaning tool
Author-email: Foresty <dsparthsrivastava@email.com>
Project-URL: Homepage, https://github.com/Parth-Srivastava-bithub/nullpy
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: scikit-learn
Requires-Dist: scipy
Requires-Dist: rich

# 📦 NullPy

### *An Intelligent, Data-Aware Pandas DataFrame Cleaner* 🚀

`NullPy` is a Python library for **automatic, intelligent, and target-aware cleaning of pandas DataFrames**.
It handles missing values, outliers, and predictive imputations using **ML models** when needed.
No more repetitive manual cleaning — NullPy decides the **best strategy automatically**.

---

## ✨ Features

* 🧹 **Automatic Missing Value Handling**

  * Detects missing values.
  * Decides best imputation strategy (`mean`, `median`, `mode`, `predictive`).
  * Can train ML models (`RandomForest`, `LinearRegression`, `LogisticRegression`) for predictive imputation.

* 📊 **Outlier Detection & Handling**

  * Detects outliers using **IQR method**.
  * Handles them via **clip, drop, or predictive imputation**.

* 🎯 **Target-Aware Cleaning**

  * Uses correlation & chi-squared tests to decide when predictive cleaning is useful.

* ⚡ **Highly Customizable**

  * Parameters for imputation strategy, outlier strategy, verbosity, and difference reporting.

* 🖥️ **Beautiful Console Output** (powered by [Rich](https://github.com/Textualize/rich))

  * Colorful progress bars.
  * Summary cleaning reports.
  * Difference reports (before vs after cleaning).

* 🔮 **Demo Reports Included**

  * Quick one-call demo for showing cleaning in action.

---

## 🚀 Installation

```bash
pip install pandas numpy scikit-learn rich
```

(Or clone the repo and drop `nullpy.py` into your project.)

---

## ⚡ Quick Usage

### 1️⃣ Basic Cleaning

```python
import pandas as pd
from nullpy import SmartDFCleaner

# Example Data
data = {
    'Age': [25, 30, None, 22, 35, 45, 28, 33, None, 50, 150],
    'Income': [50000, 60000, 58000, None, 72000, 68000, 52000, 61000, 59000, 75000, 80000],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', None, 'Female', 'Male', 'Male', 'Female', 'Male'],
    'City': ['NY', 'LA', 'NY', 'SF', 'LA', 'NY', 'SF', 'LA', 'NY', 'SF', 'LA'],
    'Purchased': [0, 1, 1, 0, 1, 1, 0, 1, None, 1, 0]
}
df = pd.DataFrame(data)

# Run cleaner
cleaner = SmartDFCleaner(target_column="Purchased", show_difference=True)
cleaned_df = cleaner.fit_transform(df)

print(cleaned_df)
```

---

### 2️⃣ One-Call Full Demo (with auto + predictive cleaning reports)

```python
from nullpy import SmartDFCleaner

newdf = SmartDFCleaner().clean_it(df, target_column="Purchased")
```

This will:

* Show original data.
* Show **auto-cleaned DataFrame**.
* Show **predictive-cleaned DataFrame**.
* Print **summary reports** + **null counts**.

---

## ⚙️ Parameters

| Parameter          | Type | Default | Description                                           |
| ------------------ | ---- | ------- | ----------------------------------------------------- |
| `target_column`    | str  | None    | Target variable for correlation/predictive imputation |
| `impute_strategy`  | str  | `auto`  | `auto`, `mean`, `median`, `mode`, `predictive`        |
| `outlier_strategy` | str  | `auto`  | `auto`, `clip`, `drop`, `predictive`                  |
| `verbose`          | bool | True    | Print logs and progress                               |
| `show_difference`  | bool | False   | Show before/after difference report                   |

---

## 📊 Example Console Output

```
> Identified 2 numerical and 2 categorical features.
> 'Age' has high missing data (18.0%). Using simple imputation.
> Applied Median Imputation to column 'Age'.
> Clipped 1 outliers in column 'Age'.
> Cleaning process completed successfully!
```

---

## 🛠️ Methods

* `fit_transform(df)` → Returns cleaned DataFrame.
* `demo_report(df, target_column, ...)` → Runs auto + predictive cleaning demo.
* `clean_it(df, target_column, ...)` → One-call shortcut for full demo + final cleaned DF.

---

## 📌 Roadmap

* 🔜 Add support for time-series cleaning.
* 🔜 Add advanced outlier detection (Isolation Forest, Z-score).
* 🔜 Export cleaning logs to JSON/CSV.

---

## 👨‍💻 Author

Made with ❤️ and ☕ by **Foresty** (India 🇮🇳)

---
