Metadata-Version: 2.4
Name: rapidfit
Version: 0.1.1
Summary: Build multi-task classifiers and augment classification datasets with ease
Author-email: Abu Bakr Soliman <bakrianoo@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/bakrianoo/RapidFit
Project-URL: Repository, https://github.com/bakrianoo/RapidFit
Keywords: machine-learning,transformers,multi-task-learning,classification,data-augmentation,nlp
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: openai>=2.0.0
Requires-Dist: json-repair>=0.55.0
Requires-Dist: rich>=14.0.0
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.57.3
Requires-Dist: datasets>=4.2.0
Requires-Dist: scikit-learn>=1.6.1
Requires-Dist: accelerate>=0.26.0
Dynamic: license-file

# RapidFit

Turn a handful of labeled examples into a production-ready multi-task classifier.

RapidFit handles the two biggest pain points in text classification: **not enough data** and **too many separate models**. Give it a few examples per class, and it will generate more training data using LLMs, then train a single model that handles all your classification tasks at once.

## Installation

```bash
pip install rapidfit
```

## Augment Your Data

Start with just a few examples. RapidFit uses LLMs to expand your dataset while preserving label quality.

```python
from rapidfit import LLMAugmenter

seed_data = {
    "sentiment": [
        {"text": "I love this product!", "label": "positive"},
        {"text": "Terrible experience.", "label": "negative"},
    ],
    "emotion": [
        {"text": "This makes me so happy!", "label": "joy"},
        {"text": "I can't believe they did this.", "label": "anger"},
    ],
}

augmenter = LLMAugmenter(api_key="your-api-key")
augmented = augmenter.augment(seed_data)
```

Configure generation with optional parameters:

| Parameter | Default | Description |
|-----------|---------|-------------|
| `model_id` | `gpt-4.1-mini` | LLM to use for generation |
| `max_samples_per_task` | `128` | Target samples per task |
| `batch_size` | `8` | Samples per generation call |
| `save_path` | `./saved` | Output directory |
| `save_format` | `json` | Format: `json`, `jsonl`, or `csv` |

## Train a Classifier

One model, multiple tasks. The multihead architecture shares a single encoder across all your classification tasks, making it efficient and consistent.

```python
from rapidfit import MultiheadClassifier

classifier = MultiheadClassifier()
classifier.train(augmented)
classifier.save("./model")
```

Customize training:

```python
classifier = MultiheadClassifier({
    "model_name": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
    "epochs": 10,
    "freeze_epochs": 3,
    "learning_rate": 2e-5,
    "patience": 3,
})
```

## Predict

```python
classifier = MultiheadClassifier()
classifier.load("./model")

# Single task
classifier.predict(["Great product!"], task="sentiment")
# [{"label": "positive", "confidence": 0.95}]

# All tasks
classifier.predict_all_tasks(["Great product!"])
```

## Extend It

Build custom augmenters or classifiers by extending the base classes:

```python
from rapidfit import BaseAugmenter, BaseClassifier
```

## License

MIT
