Metadata-Version: 2.4
Name: CGMissingData
Version: 0.1.3
Summary: MICE + Random Forest + KNN to handle missing values of CGM device
Author: HS Shad, Shubh Saraswat, Dr. Xiaohua Douglas Zhang
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.23
Requires-Dist: pandas>=1.5
Requires-Dist: scikit-learn>=1.2

# CGMissingData

CGMissingData is a simple missing-data benchmarking package that runs:

MICE imputation (IterativeImputer)

Random Forest regression

KNN regression

It helps you test model performance under different missing-value rates.


Your CSV must include at least these columns:

LBORRES — glucose value (target)

TimeSeries — time series data

TimeDifferenceMinutes — time difference in minutes

USUBJID — subject ID

How to Run?
1. Envirionment Setup- 
cd "C:\Path\To\Your\Project"

# Create a virtual environment
python -m venv .venv

# Activate the environment
.\.venv\Scripts\activate

2. Install python
python -m pip install --upgrade pip
pip install -e .

3. Ensure your dataset (e.g., MyData.csv) is located in the project file. Execute the benchmark directly from the CLI to generate a results.csv file:

.\.venv\Scripts\python.exe -c "from CGMmissingData import run_missingness_benchmark; r=run_missingness_benchmark('MyData.csv', mask_rates=[0.05, 0.10, 0.20, 0.30, 0.40]); print(r); r.to_csv('results.csv', index=False)"



Using Google Colab?
-!pip -q install CGMissingData==0.1.2 (change the version number depending our new release. You can also try with !pip -q install CGMissingData)
-from CGMissingData import run_missingness_benchmark
-df = "/content/drive/MyDrive/CGMExampleData.csv"  # your dataset path
-results = run_missingness_benchmark(
    "CGMExampleData.csv",  # or df
    mask_rates=[0.05, 0.10, 0.20, 0.30, 0.40]
)

print(results)
results.to_csv("results.csv", index=False)

