Metadata-Version: 2.4
Name: mlbench-lite
Version: 3.0.1
Summary: A simple machine learning benchmarking library
Author-email: Your Name <your.email@example.com>
Maintainer-email: Your Name <your.email@example.com>
License: MIT
Project-URL: Homepage, https://github.com/Arefin994/mlbench-lite
Project-URL: Documentation, https://github.com/Arefin994/mlbench-lite#readme
Project-URL: Repository, https://github.com/Arefin994/mlbench-lite.git
Project-URL: Bug Tracker, https://github.com/Arefin994/mlbench-lite/issues
Keywords: machine learning,benchmarking,scikit-learn,ml
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: xgboost>=1.5.0
Requires-Dist: lightgbm>=3.2.0
Requires-Dist: catboost>=1.0.0
Requires-Dist: scikit-optimize>=0.9.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov>=2.0; extra == "dev"
Requires-Dist: black>=21.0; extra == "dev"
Requires-Dist: flake8>=3.8; extra == "dev"
Requires-Dist: mypy>=0.800; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest>=6.0; extra == "test"
Requires-Dist: pytest-cov>=2.0; extra == "test"
Dynamic: license-file

# mlbench-lite

A comprehensive machine learning benchmarking library that provides an easy way to compare multiple ML models on your dataset. Built with scikit-learn, XGBoost, LightGBM, CatBoost, and pandas for seamless integration into your ML workflow. Now with **full support for both classification and regression tasks**.

## 🚀 Features

- **Complete ML Benchmarking**: 40+ ML models for both classification and regression
- **Flexible Model Selection**: Choose specific models, categories, or exclude models
- **Multiple ML Libraries**: scikit-learn, XGBoost, LightGBM, CatBoost
- **Classification & Regression**: Full support for both supervised learning tasks
- **Simple API**: One function call to benchmark multiple models
- **Comprehensive Metrics**: 
  - Classification: Accuracy, Precision, Recall, F1
  - Regression: R², MAE, RMSE, MSE
- **Custom Datasets**: Includes `load_clover` (classification) and `make_regression_dataset` (regression)
- **Easy Integration**: Works seamlessly with scikit-learn datasets
- **Pandas Output**: Results returned as a clean pandas DataFrame
- **Reproducible**: Consistent results with random state control
- **Model Information**: Get detailed info about available models

## 📦 Installation

```bash
pip install mlbench-lite
```

## 🎯 Quick Start

### Classification

```python
from mlbench_lite import benchmark, load_clover

# Load the clover dataset
X, y = load_clover(return_X_y=True)

# Benchmark all available classification models
results = benchmark(X, y)
print(results)
```

**Output:**
```
                 Model           Category  Accuracy  Precision  Recall      F1
0        Random Forest  Tree-based Models    0.9500     0.9565  0.9512  0.9505
1                  SVM        SVM Models    0.9250     0.9337  0.9255  0.9254
2  Logistic Regression    Linear Models    0.9125     0.9131  0.9117  0.9115
3              XGBoost           XGBoost    0.9000     0.9024  0.9000  0.8997
4            LightGBM          LightGBM    0.8875     0.8891  0.8875  0.8873
```

### Regression (NEW!)

```python
from mlbench_lite import benchmark_regression, make_regression_dataset

# Create a regression dataset
X, y = make_regression_dataset(n_samples=500, n_features=10, return_X_y=True)

# Benchmark all available regression models
results = benchmark_regression(X, y)
print(results)
```

**Output:**
```
               Model              Category     R2      MAE     RMSE       MSE
0  Random Forest Regressor  Tree-based Regression  0.9234   4.5123  11.2345  126.2148
1  Gradient Boosting Regressor  Tree-based Regression  0.9187   4.8234  11.4567  131.2556
2  Ridge Regression          Linear Regression  0.8934   6.1234  13.4567  181.1234
3  XGBoost Regressor          XGBoost  0.9156   5.0234  11.8234  139.8034
4  LightGBM Regressor        LightGBM  0.9112   5.1234  12.0123  144.2956
```

## 📚 API Reference

### Classification Models

#### `benchmark(X, y, test_size=0.2, random_state=42, models=None, model_categories=None, exclude_models=None)`

Benchmark multiple classification models on a dataset.

**Parameters:**
- `X` (array-like): Training vectors of shape (n_samples, n_features)
- `y` (array-like): Target values of shape (n_samples,)
- `test_size` (float, optional): Proportion of dataset for testing (default: 0.2)
- `random_state` (int, optional): Random seed for reproducibility (default: 42)
- `models` (list of str, optional): Specific models to use. If None, uses all available models.
- `model_categories` (list of str, optional): Categories of models to use. If None, uses all categories.
- `exclude_models` (list of str, optional): Models to exclude from benchmarking.

**Returns:**
- `pandas.DataFrame`: Results with columns:
  - `Model`: Name of the model
  - `Category`: Category of the model
  - `Accuracy`: Accuracy score
  - `Precision`: Precision score (macro-averaged)
  - `Recall`: Recall score (macro-averaged)
  - `F1`: F1 score (macro-averaged)

### Regression Models (NEW!)

#### `benchmark_regression(X, y, test_size=0.2, random_state=42, models=None, model_categories=None, exclude_models=None)`

Benchmark multiple regression models on a dataset.

**Parameters:**
- `X` (array-like): Training vectors of shape (n_samples, n_features)
- `y` (array-like): Target values (continuous) of shape (n_samples,)
- `test_size` (float, optional): Proportion of dataset for testing (default: 0.2)
- `random_state` (int, optional): Random seed for reproducibility (default: 42)
- `models` (list of str, optional): Specific models to use. If None, uses all available models.
- `model_categories` (list of str, optional): Categories of models to use. If None, uses all categories.
- `exclude_models` (list of str, optional): Models to exclude from benchmarking.

**Returns:**
- `pandas.DataFrame`: Results with columns:
  - `Model`: Name of the model
  - `Category`: Category of the model
  - `R2`: R-squared (coefficient of determination)
  - `MAE`: Mean Absolute Error
  - `RMSE`: Root Mean Squared Error
  - `MSE`: Mean Squared Error

### Common Functions

#### `list_available_models()`

List all available classification models and their categories.

**Returns:**
- `dict`: Dictionary with model categories as keys and lists of model names as values

#### `list_available_regressors()` (NEW!)

List all available regression models and their categories.

**Returns:**
- `dict`: Dictionary with model categories as keys and lists of regression model names as values

#### `get_model_info()`

Get detailed information about available classification models.

**Returns:**
- `pandas.DataFrame`: DataFrame with model information including category, name, and description

#### `get_regressor_info()` (NEW!)

Get detailed information about available regression models.

**Returns:**
- `pandas.DataFrame`: DataFrame with regressor information including category, name, and description

### Data Utilities

#### `load_clover(return_X_y=False)`

Load the custom clover dataset for classification tasks.

**Parameters:**
- `return_X_y` (bool, default=False): If True, returns (data, target) instead of a Bunch object

**Returns:**
- `Bunch` or `tuple`: Dataset object with data, target, feature_names, target_names, and DESCR

#### `make_regression_dataset(n_samples=500, n_features=10, n_informative=8, noise=10.0, random_state=42, return_X_y=False)` (NEW!)

Create a synthetic regression dataset for benchmarking.

**Parameters:**
- `n_samples` (int, default=500): Number of samples
- `n_features` (int, default=10): Number of features
- `n_informative` (int, default=8): Number of informative features
- `noise` (float, default=10.0): Standard deviation of Gaussian noise
- `random_state` (int, default=42): Random seed
- `return_X_y` (bool, default=False): If True, returns (data, target) instead of a Bunch object

**Returns:**
- `Bunch` or `tuple`: Dataset object with data, target, feature_names, and DESCR

## 💡 Code Examples

### 1. Basic Usage with All Models

```python
from mlbench_lite import benchmark, load_clover

# Load the clover dataset
X, y = load_clover(return_X_y=True)
print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(set(y))}")

# Benchmark all available models
results = benchmark(X, y)
print("\nBenchmark Results:")
print(results)

# Get the best model
best_model = results.iloc[0]
print(f"\n🏆 Best Model: {best_model['Model']} (Accuracy: {best_model['Accuracy']:.4f})")
```

### REGRESSION EXAMPLES

### 1R. Basic Regression Benchmarking

```python
from mlbench_lite import benchmark_regression, make_regression_dataset

# Create a regression dataset
X, y = make_regression_dataset(n_samples=500, n_features=10, return_X_y=True)
print(f"Dataset shape: {X.shape}")

# Benchmark all available regression models
results = benchmark_regression(X, y)
print("\nRegression Benchmark Results:")
print(results)

# Get the best model
best_model = results.iloc[0]
print(f"\n🏆 Best Model: {best_model['Model']} (R²: {best_model['R2']:.4f})")
```

### 2R. Regression with Specific Models

```python
from mlbench_lite import benchmark_regression, make_regression_dataset

X, y = make_regression_dataset(n_samples=500, n_features=10, return_X_y=True)

# Benchmark only specific regression models
results = benchmark_regression(
    X, y,
    models=['Linear Regression', 'Random Forest Regressor', 'XGBoost Regressor']
)
print("Selected Regression Models Results:")
print(results)
```

### 3R. Regression by Model Categories

```python
from mlbench_lite import benchmark_regression, make_regression_dataset

X, y = make_regression_dataset(n_samples=500, n_features=10, return_X_y=True)

# Benchmark only tree-based regression models
results = benchmark_regression(X, y, model_categories=['Tree-based Regression'])
print("Tree-based Regression Results:")
print(results)

# Benchmark multiple categories
results = benchmark_regression(
    X, y,
    model_categories=['Linear Regression', 'SVM Regression']
)
print("\nLinear and SVM Regression Results:")
print(results)
```

### 4R. Regression with Custom Data

```python
from mlbench_lite import benchmark_regression
from sklearn.datasets import make_regression as sklearn_make_regression

# Create a custom regression dataset
X, y = sklearn_make_regression(n_samples=1000, n_features=20, n_informative=15, noise=5.0, random_state=42)

# Benchmark regression models
results = benchmark_regression(X, y)
print("Custom Dataset Regression Results:")
print(results)
```

### BACK TO CLASSIFICATION EXAMPLES

### 1. Basic Usage with All Models

```python
from mlbench_lite import benchmark, load_clover

# Load the clover dataset
X, y = load_clover(return_X_y=True)
print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(set(y))}")

# Benchmark all available models
results = benchmark(X, y)
print("\nBenchmark Results:")
print(results)

# Get the best model
best_model = results.iloc[0]
print(f"\n🏆 Best Model: {best_model['Model']} (Accuracy: {best_model['Accuracy']:.4f})")
```

### 2. Model Selection - Specific Models

```python
from mlbench_lite import benchmark, load_clover

X, y = load_clover(return_X_y=True)

# Benchmark only specific models
results = benchmark(X, y, models=['Random Forest', 'XGBoost', 'LightGBM', 'Logistic Regression'])
print("Selected Models Results:")
print(results)
```

### 3. Model Selection - By Categories

```python
from mlbench_lite import benchmark, load_clover

X, y = load_clover(return_X_y=True)

# Benchmark only tree-based models
results = benchmark(X, y, model_categories=['Tree-based Models'])
print("Tree-based Models Results:")
print(results)

# Benchmark multiple categories
results = benchmark(X, y, model_categories=['Linear Models', 'SVM Models'])
print("\nLinear and SVM Models Results:")
print(results)
```

### 4. Exclude Specific Models

```python
from mlbench_lite import benchmark, load_clover

X, y = load_clover(return_X_y=True)

# Exclude slow models
results = benchmark(X, y, exclude_models=['Gaussian Process', 'Multi-layer Perceptron'])
print("Results without slow models:")
print(results)
```

### 5. List Available Models

```python
from mlbench_lite import list_available_models, get_model_info

# List all available models by category
models = list_available_models()
print("Available Classification Models by Category:")
for category, model_list in models.items():
    print(f"\n{category}:")
    for model in model_list:
        print(f"  - {model}")

# Get detailed model information
model_info = get_model_info()
print("\nDetailed Classification Model Information:")
print(model_info)
```

### 6. List Available Regressors (NEW!)

```python
from mlbench_lite import list_available_regressors, get_regressor_info

# List all available regression models by category
regressors = list_available_regressors()
print("Available Regression Models by Category:")
for category, model_list in regressors.items():
    print(f"\n{category}:")
    for model in model_list:
        print(f"  - {model}")

# Get detailed regressor information
regressor_info = get_regressor_info()
print("\nDetailed Regression Model Information:")
print(regressor_info)
```

### 7. Advanced Model Selection

```python
from mlbench_lite import benchmark, load_clover

X, y = load_clover(return_X_y=True)

# Complex selection: specific models from specific categories, excluding some
results = benchmark(
    X, y,
    models=['Random Forest', 'XGBoost', 'SVM (RBF)', 'Logistic Regression'],
    exclude_models=['SVM (Linear)']
)
print("Custom Selection Results:")
print(results)
```

### 8. Using with Scikit-learn Datasets

```python
from mlbench_lite import benchmark
from sklearn.datasets import load_wine, load_breast_cancer

# Test with Wine dataset
print("=== Wine Dataset ===")
X, y = load_wine(return_X_y=True)
results = benchmark(X, y)
print(results)

# Test with Breast Cancer dataset
print("\n=== Breast Cancer Dataset ===")
X, y = load_breast_cancer(return_X_y=True)
results = benchmark(X, y)
print(results)
```

### 9. Custom Test Size

```python
from mlbench_lite import benchmark, load_clover

X, y = load_clover(return_X_y=True)

# Use 30% of data for testing
results = benchmark(X, y, test_size=0.3)
print("Results with 30% test size:")
print(results)

# Use 10% of data for testing
results = benchmark(X, y, test_size=0.1)
print("\nResults with 10% test size:")
print(results)
```

### 10. Reproducible Results

```python
from mlbench_lite import benchmark, load_clover

X, y = load_clover(return_X_y=True)

# Set random seed for reproducible results
results1 = benchmark(X, y, random_state=123)
results2 = benchmark(X, y, random_state=123)

print("Results with random_state=123:")
print(results1)
print(f"\nResults are identical: {results1.equals(results2)}")

# Different random state produces different results
results3 = benchmark(X, y, random_state=456)
print(f"\nDifferent random state produces different results: {not results1.equals(results3)}")
```

### 11. Working with Synthetic Data

```python
from mlbench_lite import benchmark
from sklearn.datasets import make_classification

# Create synthetic dataset
X, y = make_classification(
    n_samples=1000,
    n_features=20,
    n_informative=15,
    n_classes=4,
    random_state=42
)

print(f"Synthetic dataset shape: {X.shape}")
print(f"Number of classes: {len(set(y))}")

results = benchmark(X, y)
print("\nBenchmark Results:")
print(results)
```

### 12. Analyzing Results

```python
from mlbench_lite import benchmark, load_clover
import pandas as pd

X, y = load_clover(return_X_y=True)
results = benchmark(X, y)

# Display results with better formatting
print("Detailed Results:")
print("=" * 60)
for idx, row in results.iterrows():
    print(f"{row['Model']:20} | Acc: {row['Accuracy']:.4f} | "
          f"Prec: {row['Precision']:.4f} | Rec: {row['Recall']:.4f} | "
          f"F1: {row['F1']:.4f}")

# Find models with accuracy > 0.9
high_accuracy = results[results['Accuracy'] > 0.9]
print(f"\nModels with accuracy > 0.9: {len(high_accuracy)}")

# Calculate average metrics
avg_metrics = results[['Accuracy', 'Precision', 'Recall', 'F1']].mean()
print(f"\nAverage metrics across all models:")
for metric, value in avg_metrics.items():
    print(f"  {metric}: {value:.4f}")
```

### 13. Comparing Regression Models

```python
from mlbench_lite import benchmark_regression, make_regression_dataset

# Create a regression dataset
X, y = make_regression_dataset(n_samples=300, n_features=15, return_X_y=True)

# Compare linear vs tree-based regression models
linear_results = benchmark_regression(X, y, model_categories=['Linear Regression'])
tree_results = benchmark_regression(X, y, model_categories=['Tree-based Regression'])

print("Linear Regression Models:")
print(linear_results[['Model', 'R2', 'MAE']].to_string(index=False))

print("\n\nTree-based Regression Models:")
print(tree_results[['Model', 'R2', 'MAE']].to_string(index=False))

# Find best models in each category
best_linear = linear_results.iloc[0]
best_tree = tree_results.iloc[0]

print(f"\nBest Linear Model: {best_linear['Model']} (R²: {best_linear['R2']:.4f})")
print(f"Best Tree Model: {best_tree['Model']} (R²: {best_tree['R2']:.4f})")
```

### 14. Comparing Classification and Regression

```python
from mlbench_lite import benchmark, benchmark_regression, load_clover, make_regression_dataset

# Classification benchmarking
X_clf, y_clf = load_clover(return_X_y=True)
clf_results = benchmark(X_clf, y_clf)

# Regression benchmarking
X_reg, y_reg = make_regression_dataset(n_samples=400, return_X_y=True)
reg_results = benchmark_regression(X_reg, y_reg)

print("📊 CLASSIFICATION RESULTS (Top 5):")
print(clf_results[['Model', 'Category', 'Accuracy', 'F1']].head().to_string(index=False))

print("\n📉 REGRESSION RESULTS (Top 5):")
print(reg_results[['Model', 'Category', 'R2', 'MAE']].head().to_string(index=False))
```

### 15. Comparing Different Datasets

```python
from mlbench_lite import benchmark, load_clover
from sklearn.datasets import load_wine, load_breast_cancer

datasets = [
    ("Clover", load_clover(return_X_y=True)),
    ("Wine", load_wine(return_X_y=True)),
    ("Breast Cancer", load_breast_cancer(return_X_y=True))
]

print("Dataset Comparison:")
print("=" * 80)

for name, (X, y) in datasets:
    print(f"\n{name} Dataset:")
    print(f"  Shape: {X.shape}, Classes: {len(set(y))}")
    
    results = benchmark(X, y)
    best_acc = results.iloc[0]['Accuracy']
    best_model = results.iloc[0]['Model']
    
    print(f"  Best Model: {best_model} (Accuracy: {best_acc:.4f})")
    
    # Show top 2 models
    print("  Top 2 Models:")
    for idx, row in results.head(2).iterrows():
        print(f"    {row['Model']}: {row['Accuracy']:.4f}")
```

## 🔬 Models Included

The library includes **40+ machine learning models** from multiple categories:

### **Classification Models** 

#### **Linear Models**
- **Logistic Regression**: Linear model for classification using logistic function
- **Ridge Classifier**: Linear classifier with L2 regularization
- **SGD Classifier**: Linear classifier using Stochastic Gradient Descent
- **Perceptron**: Simple linear classifier
- **Passive Aggressive**: Online learning algorithm for classification

#### **Tree-based Models**
- **Decision Tree**: Non-parametric supervised learning method
- **Random Forest**: Ensemble of decision trees with bagging
- **Extra Trees**: Extremely randomized trees ensemble
- **Gradient Boosting**: Boosting ensemble method using gradient descent
- **AdaBoost**: Adaptive boosting ensemble method
- **Bagging Classifier**: Bootstrap aggregating ensemble method

#### **SVM Models**
- **SVM (RBF)**: Support Vector Machine with RBF kernel
- **SVM (Linear)**: Support Vector Machine with linear kernel

#### **Neighbors**
- **K-Nearest Neighbors**: Instance-based learning algorithm

#### **Naive Bayes**
- **Gaussian Naive Bayes**: Naive Bayes classifier for Gaussian features
- **Multinomial Naive Bayes**: Naive Bayes classifier for multinomial features
- **Bernoulli Naive Bayes**: Naive Bayes classifier for binary features

#### **Discriminant Analysis**
- **Linear Discriminant Analysis**: Linear dimensionality reduction and classification
- **Quadratic Discriminant Analysis**: Quadratic classifier with Gaussian assumptions

#### **Neural Networks**
- **Multi-layer Perceptron**: Feedforward artificial neural network

#### **Gaussian Process**
- **Gaussian Process**: Probabilistic classifier using Gaussian processes

#### **Advanced Gradient Boosting**
- **XGBoost**: Extreme gradient boosting framework (if installed)
- **LightGBM**: Light gradient boosting machine (if installed)
- **CatBoost**: Categorical boosting framework (if installed)

### **Regression Models (NEW!)** ⭐

#### **Linear Regression**
- **Linear Regression**: Simple linear regression
- **Ridge Regression**: Linear regression with L2 regularization
- **Lasso Regression**: Linear regression with L1 regularization
- **ElasticNet Regression**: Linear regression with L1 and L2 regularization
- **Bayesian Ridge**: Bayesian linear regression
- **Huber Regressor**: Linear regression robust to outliers
- **Quantile Regressor**: Linear regression for quantile predictions

#### **Tree-based Regression**
- **Decision Tree Regressor**: Non-parametric tree-based regression
- **Random Forest Regressor**: Ensemble of trees with bagging
- **Extra Trees Regressor**: Extremely randomized trees for regression
- **Gradient Boosting Regressor**: Boosting ensemble for regression
- **AdaBoost Regressor**: Adaptive boosting for regression
- **Bagging Regressor**: Bootstrap aggregating for regression

#### **SVM Regression**
- **SVR (RBF)**: Support Vector Regression with RBF kernel
- **SVR (Linear)**: Support Vector Regression with linear kernel

#### **Neighbors**
- **K-Neighbors Regressor**: Instance-based regression

#### **Neural Networks**
- **MLP Regressor**: Multi-layer perceptron for regression

#### **Gaussian Process**
- **Gaussian Process Regressor**: Probabilistic regression

#### **Advanced Gradient Boosting**
- **XGBoost Regressor**: Extreme gradient boosting for regression (if installed)
- **LightGBM Regressor**: Light gradient boosting for regression (if installed)
- **CatBoost Regressor**: Categorical boosting for regression (if installed)

All models use their default parameters with appropriate random seeds for reproducibility.

## 📊 Classification vs Regression Coverage

### Classification (Legacy - Unchanged)

The library continues to support all previously available classification models. Use `benchmark()` for classification tasks.

**Supported:**
- 20+ classification models across 8+ categories
- Metrics: Accuracy, Precision, Recall, F1
- Dataset: `load_clover()` for testing

### Regression (NEW!)

The library now supports comprehensive regression benchmarking with 20+ regression models. Use `benchmark_regression()` for regression tasks.

**Supported:**
- 20+ regression models across 7+ categories
- Metrics: R², MAE, RMSE, MSE
- Dataset: `make_regression_dataset()` for testing

## 📊 Built-in Datasets

### Clover Dataset (Classification)

The `load_clover()` function provides a custom synthetic dataset:

- **Samples**: 400
- **Features**: 4
- **Classes**: 4

**Features:**
- `leaf_length`: Length of the leaf in cm
- `leaf_width`: Width of the leaf in cm
- `petiole_length`: Length of the petiole in cm
- `leaflet_count`: Number of leaflets per leaf

**Classes:**
- `white_clover`: Trifolium repens
- `red_clover`: Trifolium pratense
- `crimson_clover`: Trifolium incarnatum
- `alsike_clover`: Trifolium hybridum

### Regression Dataset (Regression)

The `make_regression_dataset()` function creates customizable synthetic regression datasets:

- **Default Samples**: 500
- **Default Features**: 10
- **Default Informative Features**: 8
- **Default Noise**: 10.0

Fully customizable for different benchmarking scenarios.

## 🛠️ Requirements

### **Core Dependencies**
- Python >= 3.8
- scikit-learn >= 1.0.0
- pandas >= 1.3.0
- numpy >= 1.20.0

### **Optional Dependencies (for additional models)**
- xgboost >= 1.5.0 (for XGBoost models)
- lightgbm >= 3.2.0 (for LightGBM models)
- catboost >= 1.0.0 (for CatBoost models)
- scikit-optimize >= 0.9.0 (for advanced optimization)

**Note**: The library works with just the core dependencies. Optional dependencies are automatically installed when you install the package, but models from unavailable libraries will be skipped gracefully.

## 🧪 Testing

Run the test suite to verify everything works:

```bash
# Run all tests
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=mlbench_lite

# Quick functionality test
python -c "from mlbench_lite import benchmark, load_clover; X, y = load_clover(return_X_y=True); results = benchmark(X, y); print(results)"
```

## 🚀 Development

### Setup Development Environment

```bash
git clone https://github.com/Arefin994/mlbench-lite.git
cd mlbench-lite
pip install -e ".[dev]"
```

### Code Quality

```bash
# Format code
black mlbench_lite tests

# Lint code
flake8 mlbench_lite tests

# Type checking
mypy mlbench_lite
```

### Building for Distribution

```bash
# Build package
python -m build

# Upload to PyPI
twine upload dist/*
```

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## 📈 Changelog

### 3.0.0 (2024-02-07) ⭐
- **MAJOR MILESTONE**: Full regression model support added!
- **NEW**: 20+ regression models across 7+ categories
- **NEW**: `benchmark_regression()` function for regression benchmarking
- **NEW**: Regression metrics: R², MAE, RMSE, MSE
- **NEW**: `make_regression_dataset()` for synthetic regression data
- **NEW**: `list_available_regressors()` and `get_regressor_info()` functions
- **NEW**: Comprehensive regression model categories (Linear, Tree-based, SVM, Neural Networks, Gaussian Process, Advanced Boosting)
- **NEW**: 40+ regression examples in documentation
- **IMPROVED**: Full test coverage for regression (40+ test cases)
- **IMPROVED**: Regression support for XGBoost, LightGBM, and CatBoost
- **IMPROVED**: Version bumped to 3.0.0 reflecting major feature addition
- **MAINTAINED**: 100% backward compatibility with classification functionality

### 2.0.0 (2024-01-XX)
- **MAJOR UPDATE**: Added 20+ machine learning models
- **NEW**: Flexible model selection (specific models, categories, exclusions)
- **NEW**: Support for XGBoost, LightGBM, and CatBoost
- **NEW**: Model information and listing functions
- **NEW**: Comprehensive model categories (Linear, Tree-based, SVM, etc.)
- **IMPROVED**: Enhanced API with more parameters
- **IMPROVED**: Better error handling and graceful degradation
- **IMPROVED**: Updated documentation with extensive examples

### 0.1.0 (2024-01-XX)
- Initial release
- Basic benchmarking functionality
- Support for Logistic Regression, Random Forest, and SVM
- Comprehensive metrics (Accuracy, Precision, Recall, F1)
- Custom clover dataset
- Full test coverage
- PyPI ready

## 🆘 Support

If you encounter any issues or have questions:

1. Check the [Issues](https://github.com/Arefin994/mlbench-lite/issues) page
2. Create a new issue with detailed information
3. Include code examples and error messages

## 🙏 Acknowledgments

- Built with [scikit-learn](https://scikit-learn.org/)
- Uses [pandas](https://pandas.pydata.org/) for data handling
- Inspired by the need for simple ML benchmarking tools
