Metadata-Version: 2.4
Name: microimpute
Version: 1.14.3
Summary: Benchmarking imputation methods for microdata
Author-email: María Juaristi <juaristi@uni.minerva.edu>, Nikhil Woodruff <nikhil.woodruff@outlook.com>
Requires-Python: <3.14,>=3.12
Description-Content-Type: text/markdown
Requires-Dist: numpy<3.0.0,>=2.0.0
Requires-Dist: pandas<3.0.0,>=2.2.0
Requires-Dist: plotly<6.0.0,>=5.24.0
Requires-Dist: scikit-learn<2.0.0,>=1.7.0
Requires-Dist: scipy<1.17.0,>=1.16.0
Requires-Dist: requests<3.0.0,>=2.32.0
Requires-Dist: tqdm<5.0.0,>=4.65.0
Requires-Dist: statsmodels<0.16.0,>=0.14.5
Requires-Dist: quantile-forest<1.5.0,>=1.4.1
Requires-Dist: pydantic<3.0.0,>=2.8.0
Requires-Dist: optuna<5.0.0,>=4.3.0
Requires-Dist: joblib<2.0.0,>=1.5.0
Requires-Dist: psutil
Provides-Extra: dev
Requires-Dist: pytest<9.0.0,>=8.0.0; extra == "dev"
Requires-Dist: pytest-cov<7.0.0,>=6.0.0; extra == "dev"
Requires-Dist: flake8<8.0.0,>=7.0.0; extra == "dev"
Requires-Dist: black>=24.0.0; extra == "dev"
Requires-Dist: isort<6.0.0,>=5.13.0; extra == "dev"
Requires-Dist: mypy<2.0.0,>=1.2.3; extra == "dev"
Requires-Dist: build<2.0.0,>=1.2.0; extra == "dev"
Requires-Dist: linecheck<0.3.0,>=0.1.0; extra == "dev"
Requires-Dist: towncrier>=24.8.0; extra == "dev"
Provides-Extra: matching
Requires-Dist: rpy2<4.0.0,>=3.5.0; extra == "matching"
Provides-Extra: mdn
Requires-Dist: pytorch-tabular>=1.1.0; extra == "mdn"
Requires-Dist: torch>=2.0.0; extra == "mdn"
Provides-Extra: docs
Requires-Dist: jupyter-book; extra == "docs"
Requires-Dist: furo>=2024.0.0; extra == "docs"
Requires-Dist: ipywidgets<9.0.0,>=8.0.0; extra == "docs"
Requires-Dist: plotly<6.0.0,>=5.24.0; extra == "docs"
Requires-Dist: h5py<4.0.0,>=3.1.0; extra == "docs"
Provides-Extra: images
Requires-Dist: kaleido<0.3.0,>=0.2.1; extra == "images"

# Microimpute

Microimpute is a Python package for imputing variables from one survey dataset onto another. It wraps five imputation methods behind a common interface so you can benchmark them on your data and pick the one that works best, rather than defaulting to a single approach.

## Methods

- **Statistical Matching**: distance-based matching to find similar donor observations
- **Ordinary Least Squares (OLS)**: linear regression imputation
- **Quantile Regression**: models conditional quantiles instead of the conditional mean
- **Quantile Random Forests (QRF)**: non-parametric, tree-based quantile estimation
- **Mixture Density Networks (MDN)**: neural network with a Gaussian mixture output

## Autoimpute

The `autoimpute` function tunes hyperparameters, runs cross-validation across all five methods, and selects the best performer based on quantile loss (for numerical targets) or log loss (for categorical targets). It handles numerical, categorical, and boolean variables.

## API

All models follow a `fit()` / `predict()` interface. The package supports sample weights to account for survey design, and validates inputs automatically. Adding a custom imputation method is straightforward since new models just need to implement the same interface.

## Documentation and paper

- [Documentation](https://policyengine.github.io/microimpute/) with examples and interactive notebooks
- [Paper](https://github.com/PolicyEngine/microimpute/blob/main/paper/main.pdf) presenting microimpute and demonstrating it for SCF-to-CPS net worth imputation

## Dashboard

An interactive dashboard for exploring imputation results is available at https://microimpute-dashboard.vercel.app/. It supports file upload, URL loading, direct GitHub artifact integration, and sample data.

## Installation

```bash
pip install microimpute
```

For image export (PNG/JPG):

```bash
pip install microimpute[images]
```

## Contributing

Pull requests are welcome. If you find a bug or have a feature idea, open an issue or submit a PR.
