Metadata-Version: 2.4
Name: anyml
Version: 0.2.3
Summary: Simple AutoML wrapper for tabular data. One-liner API for classification and regression.
Project-URL: Homepage, https://github.com/vietanhdev/anyml
Project-URL: Documentation, https://github.com/vietanhdev/anyml#readme
Project-URL: Repository, https://github.com/vietanhdev/anyml
Project-URL: Issues, https://github.com/vietanhdev/anyml/issues
Author-email: Viet-Anh Nguyen <vietanh.dev@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: automl,classification,machine-learning,regression,scikit-learn
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Requires-Dist: click>=8.0
Requires-Dist: joblib
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: scikit-learn
Provides-Extra: dev
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Provides-Extra: full
Requires-Dist: anyllm; extra == 'full'
Requires-Dist: lightgbm; extra == 'full'
Requires-Dist: xgboost; extra == 'full'
Provides-Extra: lightgbm
Requires-Dist: lightgbm; extra == 'lightgbm'
Provides-Extra: llm
Requires-Dist: anyllm; extra == 'llm'
Provides-Extra: progress
Requires-Dist: tqdm; extra == 'progress'
Provides-Extra: xgboost
Requires-Dist: xgboost; extra == 'xgboost'
Description-Content-Type: text/markdown

<h1 align="center">anyml</h1>
<p align="center"><em>AutoML for tabular data — classify, regress, and forecast in 3 lines of code.</em></p>

<p align="center">
<img src="https://img.shields.io/pypi/v/anyml.svg" alt="PyPI">
<img src="https://img.shields.io/pypi/pyversions/anyml.svg" alt="Python">
<img src="https://img.shields.io/pypi/l/anyml.svg" alt="License">
</p>

**anyml** is a dead-simple AutoML library for tabular data. Point it at a CSV or a DataFrame and a target column, and it will auto-detect column types, build a preprocessing pipeline, train and compare several models via cross-validation, and return the best one. It uses scikit-learn under the hood for the core models and optionally integrates XGBoost and LightGBM for stronger gradient-boosted baselines.

Built by [Viet-Anh Nguyen](https://github.com/vietanhdev) at [NRL.ai](https://www.nrl.ai).

## Why anyml?

- **One-liner API** — `anyml.classify(df, target="y")` trains and benchmarks models automatically
- **Plugin architecture** — Register custom estimators or preprocessing steps
- **Local-first** — Runs entirely on CPU, no API calls
- **Minimal core deps** — `scikit-learn`, `pandas`, `numpy`; XGBoost/LightGBM are optional
- **Production-ready** — Serializable pipelines, prediction intervals, feature importance

## Installation

```bash
pip install anyml
```

For optional boosting libraries:

```bash
pip install anyml[xgboost]     # XGBoost estimator
pip install anyml[lightgbm]    # LightGBM estimator
pip install anyml[all]         # everything
```

**Python 3.8+ supported** (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)

## Quick Start

```python
import anyml
import pandas as pd

df = pd.read_csv("titanic.csv")

# 1. AutoML classification (auto-preprocesses, trains 3-5 models, returns the best)
model = anyml.classify(df, target="Survived")
print(model.best_name, model.best_score)    # e.g. "RandomForestClassifier" 0.83

# 2. Predict on new data
preds = model.predict(df.drop(columns=["Survived"]))

# 3. Regression is the same one-liner
price_model = anyml.regress(pd.read_csv("houses.csv"), target="price")
print(price_model.metrics)                  # rmse, mae, r2

# 4. Time-series forecasting
sales = pd.read_csv("sales.csv", parse_dates=["date"])
forecast = anyml.forecast(sales, time_col="date", target="sales", horizon=30)
```

## Models & Methods

**anyml** trains and compares multiple models, then picks the best via cross-validation.

**Classification models:**
- `LogisticRegression` (sklearn) — fast linear baseline
- `RandomForestClassifier` (sklearn) — robust tree ensemble
- `XGBoostClassifier` (optional via `[xgboost]`) — gradient boosting
- `LGBMClassifier` (optional via `[lightgbm]`) — fast gradient boosting

**Regression models:**
- `LinearRegression` (sklearn)
- `RandomForestRegressor` (sklearn)
- `XGBoostRegressor` (optional)
- `LGBMRegressor` (optional)

**Auto-preprocessing pipeline:**
1. **Type detection** — numeric, categorical, datetime columns auto-identified
2. **Missing value imputation** — median for numeric, mode for categorical
3. **Encoding** — OneHotEncoder for low-cardinality categoricals, OrdinalEncoder for high-cardinality
4. **Scaling** — StandardScaler for numeric features
5. **Datetime features** — extracts year, month, day, dayofweek, hour from datetime columns

**Model selection** — Stratified 5-fold cross-validation, picks best by F1 (classification) or R² (regression).

**Feature importance** — Native (tree models), coefficient-based (linear), or permutation importance fallback.

**Time series forecasting** — moving average, exponential smoothing, or linear trend (auto-selected by in-sample RMSE).

## API Reference

| Function | Purpose |
|---|---|
| `anyml.classify(df, target, models=None, cv=5)` | AutoML classification |
| `anyml.regress(df, target, models=None, cv=5)` | AutoML regression |
| `anyml.forecast(df, time_col, target, horizon)` | Time-series forecasting |
| `anyml.profile(df)` | Quick data profile before training |
| `model.predict(df)` | Inference on new data |
| `model.predict_proba(df)` | Probabilities (classification) |
| `model.feature_importance()` | Ranked feature importances |
| `model.save(path)` / `anyml.load(path)` | Persistence |

## CLI Usage

```bash
anyml classify titanic.csv --target Survived --out titanic.pkl
anyml regress houses.csv --target price
anyml forecast sales.csv --time-col date --target sales --horizon 30
anyml predict titanic.pkl new_passengers.csv --out preds.csv
```

## Examples

### Classification with feature importance

```python
import anyml
import pandas as pd

df = pd.read_csv("churn.csv")
model = anyml.classify(df, target="churn", cv=5)

print(f"Best model: {model.best_name} ({model.best_score:.3f})")
for feat, imp in model.feature_importance().head(10).items():
    print(f"  {feat}: {imp:.3f}")

model.save("churn.pkl")
```

### Constrain which models are tried

```python
import anyml

# Only try XGBoost and LightGBM (requires anyml[all])
model = anyml.classify(df, target="y", models=["xgboost", "lightgbm"])
```

### Forecast next 90 days

```python
import anyml, pandas as pd

sales = pd.read_csv("sales.csv", parse_dates=["date"])
fc = anyml.forecast(sales, time_col="date", target="revenue", horizon=90)

fc.plot()            # matplotlib plot of history + forecast
fc.to_csv("forecast.csv")
```

## License

MIT (c) Viet-Anh Nguyen
