Metadata-Version: 2.1
Name: generalize
Version: 0.2.1
Summary: Machine learning tools for running repeated nested leave-one-dataset-out validation and more.
Home-page: https://github.com/ludvigolsen/generalize
Author: Ludvig Renbo Olsen
Author-email: mail@ludvigolsen.dk
Requires-Python: >=3.9,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: joblib (==1.2.0)
Requires-Dist: nattrs (>=0.2.2,<0.3.0)
Requires-Dist: numpy (==1.26.4)
Requires-Dist: pandas (==1.5.3)
Requires-Dist: scikit-learn (==1.0.2)
Requires-Dist: scipy (>=1.7.3,<2.0.0)
Requires-Dist: statsmodels (==0.14.1)
Requires-Dist: utipy (>=1.0.3,<2.0.0)
Project-URL: Repository, https://github.com/ludvigolsen/generalize
Project-URL: issues, https://github.com/ludvigolsen/generalize/issues
Description-Content-Type: text/markdown

# Generalize <a href='https://github.com/LudvigOlsen/generalize'><img src='https://raw.githubusercontent.com/LudvigOlsen/generalize/master/generalize_242x280_250dpi.png' align="right" height="140" /></a>

**Author:** [Ludvig R. Olsen](https://www.ludvigolsen.dk/) ( <r-pkgs@ludvigolsen.dk> )

The ultimate goal of training machine learning models is to generalize to new, unseen data. This package contains tools for measuring model performance across multiple datasets via cross-dataset-validation (aka. leave-one-dataset-out).

Under development!

 - Not generalized enough for general usage (ironic, I know)
 - Relies on an old version of scikit-learn, needs updating
 - Linear regression is not currently implemented
 - Help strings are likely not up-to-date

### Main functions and classes

| Function                       | Description                                                                        |
|:-------------------------------|:-----------------------------------------------------------------------------------|
| `nested_cross_validate()`      | Run (repeated) nested cross-validation.                                            |
| `train_full_model()`           | Train model on all data and save to disk.                                          |
| `evaluate_univariate_models()` | Evaluate prediction potential of every predictor separately.                       |
| `PipelineDesigner`             | Design a scikit-learn pipeline for use in cross-validation.                        |
| `ROCCurve`, `ROCCurves`        | ROC curve containers with various utility methods.                                 |
| `select_samples()`             | Utility for selecting samples based on (collapsed) labels.                         |
