Metadata-Version: 2.1
Name: dm_utils
Version: 0.1.1
Summary: Data Mining Utils
Home-page: https://pypi.org/project/dm_utils/
Author: Mingze He
Author-email: hemingze126@126.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: catboost~=1.2.5
Requires-Dist: colorama~=0.4.6
Requires-Dist: ipython~=8.26.0
Requires-Dist: joblib~=1.4.2
Requires-Dist: lightgbm~=4.3.0
Requires-Dist: matplotlib~=3.9.0
Requires-Dist: ngboost~=0.5.1
Requires-Dist: numpy~=1.26.4
Requires-Dist: pandas~=2.2.2
Requires-Dist: pytorch-tabnet~=4.1.0
Requires-Dist: scikit-learn~=1.4.0
Requires-Dist: scipy~=1.12.0
Requires-Dist: seaborn~=0.13.2
Requires-Dist: tqdm~=4.66.5
Requires-Dist: xgboost~=2.0.3

# README

`dm_utils` is a utility for Data Mining.

## Installation

```bash
pip install dm_utils
```

## Usage

- `dm_utils.hom` : hold-out method

```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from dm_utils.hom import HOM

x, y = load_iris(return_X_y=True, as_frame=True)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
# classification task, xgboost and lightgbm model
hom = HOM(task='cls', model=['xgb', 'lgb'])
hom.fit(xtrain, ytrain, record_time=True)
ypred = (hom.predict(xtest) > 0.5).argmax(axis=1)
print(accuracy_score(ypred, ytest))
```

- `dm_utils.oof` : out of fold prediction

```python
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from dm_utils.oof import OOF

x, y = load_breast_cancer(return_X_y=True, as_frame=True)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
# classification task, 2*xgboost, 2*lightgbm and 1*catboost model for 5-fold oof
oof = OOF(task='cls', model=['xgb', 'xgb', 'lgb', 'lgb', 'cb'])
oof.fit(xtrain, ytrain, record_time=True)
ypred = oof.predict(xtest) > 0.5
print(accuracy_score(ypred, ytest))
```

## Features

support algorithm: `scikit-learn`, `xgboost`, `lightgbm`, `catboost`, `ngboost` and `pytorch-tabnet`
