Metadata-Version: 2.4
Name: assumpcheck
Version: 0.1.0
Summary: Simple statistical assumption checks for ANOVA, linear regression, and logistic regression.
License-Expression: MIT
Project-URL: Homepage, https://github.com/Josiah-DeValois/assumpcheck
Project-URL: Repository, https://github.com/Josiah-DeValois/assumpcheck
Project-URL: Issues, https://github.com/Josiah-DeValois/assumpcheck/issues
Keywords: anova,assumption-checking,linear-regression,logistic-regression,statistics,statsmodels
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Education
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.23
Requires-Dist: pandas>=1.5
Requires-Dist: scipy>=1.10
Requires-Dist: statsmodels>=0.14
Requires-Dist: matplotlib>=3.7
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Dynamic: license-file

# assumpcheck

`assumpcheck` is a small Python package for checking core assumptions for:

- ANOVA
- Linear regression
- Logistic regression

The package is designed to stay simple:

- concise terminal output by default
- plots only when they matter
- mitigation suggestions when something fails
- optional structured output for programmatic use

## Installation

```bash
pip install "git+https://github.com/Josiah-DeValois/assumpcheck.git"
```

Once the package is published on PyPI, the installation target will become:

```bash
pip install assumpcheck
```

The package currently depends on:

- `numpy`
- `pandas`
- `scipy`
- `statsmodels`
- `matplotlib`

## Public API

```python
from assumpcheck import (
    check_anova,
    check_linear_regression,
    check_logistic_regression,
)
```

### ANOVA

```python
report = check_anova(y=y, groups=groups)
```

### Linear regression

```python
report = check_linear_regression(model=fitted_ols_model)
```

### Logistic regression

```python
report = check_logistic_regression(model=fitted_logit_model)
```

## Quickstart

```python
import numpy as np
import pandas as pd

from assumpcheck import check_linear_regression

rng = np.random.default_rng(27)
X = pd.DataFrame(
    {
        "x1": rng.normal(size=35),
        "x2": rng.normal(size=35),
    }
)
y = 1.0 + 1.8 * X["x1"] - 0.6 * X["x2"] + rng.normal(scale=0.35, size=35)

report = check_linear_regression(
    X=X,
    y=y,
    design_independent=True,
    plots_on_fail=False,
)
```

Typical output looks like:

```text
LINEAR REGRESSION ASSUMPTION CHECKS
[PASS] Linearity
[PASS] Independence
[PASS] Normality of residuals
[PASS] Homoscedasticity
[PASS] Multicollinearity
[WARN] Extreme influential points

Summary: 5 pass, 1 warn
```

## Example output

```text
ANOVA ASSUMPTION CHECKS
[INFO] Independence
[PASS] Normality of residuals
[PASS] Equal variance across groups
[FAIL] Extreme outliers

Summary: 2 pass, 1 fail, 1 info

Details:
- Extreme outliers [FAIL]
  Metric: Max |standardized residual| = 3.420; flagged points > 3: 1
  Threshold: Values above 2 deserve review and values above 3 are concerning.
  Interpretation: At least one observation has a standardized residual above the common concern threshold.
  Possible mitigation:
    - Verify data entry for flagged cases.
    - Check whether the observation is legitimate but unusual.
    - Consider a transformation, robust method, or nonparametric alternative if outliers remain influential.
```

## Options

All three public functions support these core options:

- `alpha=0.05`
- `show_all=False`
- `plots_on_fail=True`
- `verbose=False`
- `return_dict=False`
- `design_independent=None`

Additional model-specific option:

- `check_linear_regression(..., ordered=False)`

### Output behavior

Default behavior:

- prints a concise summary
- shows plots for failed or warning-level checks if `plots_on_fail=True`
- returns an `AssumptionReport` object

Optional behavior:

- `verbose=True` prints detail for every check
- `show_all=True` prints all details and shows all available plots
- `return_dict=True` returns a serializable dictionary

### Independence handling

By default, independence is treated as a design question:

- `design_independent=None` gives an `INFO` result
- `design_independent=True` gives a `PASS` unless an ordered linear model also shows autocorrelation warnings
- `design_independent=False` gives a `FAIL`

## Current checks

### ANOVA

- Independence
- Normality of residuals
- Equal variance across groups
- Extreme outliers

### Linear regression

- Linearity
- Independence
- Normality of residuals
- Homoscedasticity
- Multicollinearity
- Extreme influential points

### Logistic regression

- Linearity in the log-odds
- Independence
- Multicollinearity
- Extreme influential points
- Adequate sample / no separation
- Model fit summary via ROC / AUC

## Notes on the MVP

- The package prioritizes `statsmodels` models first.
- Thresholds are intentionally presented as heuristics.
- Logistic ROC / AUC is treated as a fit diagnostic, not a strict assumption.
- Some diagnostics need access to original data or design metadata to be fully informative.
- The current influence heuristics are intentionally conservative, so clean linear or logistic examples may still emit a `WARN`.

## Examples

- Script: `examples/basic_usage.py`
- Notebook: `examples/assumpcheck_examples.ipynb`

To rebuild the executed notebook and example plot assets:

```bash
python examples/build_workflow_artifacts.py
```

## Tests

Run the test suite with:

```bash
python -m pytest -q
```

## Release Process

For the first public release to TestPyPI and PyPI using GitHub Trusted Publishing,
see `RELEASING.md`.
