Metadata-Version: 2.4
Name: qpatch
Version: 0.2.0
Summary: Automatic monkey-patches for quantized model fine-tuning — fixes QLoRA issues with custom architectures
Author-email: "Andrew H. Bond" <agi.hpc@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/ahb-sjsu/qpatch
Project-URL: Repository, https://github.com/ahb-sjsu/qpatch
Project-URL: Issues, https://github.com/ahb-sjsu/qpatch/issues
Keywords: qlora,lora,quantization,4-bit,fine-tuning,llm,transformers,peft
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0
Requires-Dist: transformers>=4.30
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: black>=24.0; extra == "dev"
Requires-Dist: peft>=0.6; extra == "dev"
Requires-Dist: safetensors>=0.3; extra == "dev"
Dynamic: license-file

# qpatch

[![CI](https://github.com/ahb-sjsu/qpatch/actions/workflows/ci.yml/badge.svg)](https://github.com/ahb-sjsu/qpatch/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/qpatch)](https://pypi.org/project/qpatch/)
[![Python](https://img.shields.io/pypi/pyversions/qpatch)](https://pypi.org/project/qpatch/)
[![License](https://img.shields.io/pypi/l/qpatch)](https://github.com/ahb-sjsu/qpatch/blob/main/LICENSE)

**Automatic fixes for quantized model fine-tuning.**

`qpatch` monkey-patches known incompatibilities when training 4-bit/8-bit quantized models with LoRA adapters — especially on models with custom CUDA kernels (Mamba, MoE, RWKV, xLSTM).

## The Problem

QLoRA training fails on many non-standard architectures with cryptic errors:

```
AttributeError: 'NoneType' object has no attribute 'get'          # safetensors metadata
RuntimeError: expected mat1 and mat2 to have the same dtype        # uint8 vs float16
RuntimeError: mat1 and mat2 shapes cannot be multiplied            # fused kernels + 4-bit
RuntimeError: index_add_(): self (BFloat16) and source (Float)     # MoE dtype mismatch
RuntimeError: "fused_dropout" not implemented for 'Byte'           # dropout on uint8
```

## The Fix

```python
import qpatch
qpatch.patch_all()

# That's it. Now train normally.
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model

model = AutoModelForCausalLM.from_pretrained(
    "nvidia/Nemotron-3-Nano-30B",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True),
    device_map="auto",
    trust_remote_code=True,
)
model = get_peft_model(model, LoraConfig(r=32, target_modules=["up_proj", "down_proj"]))
trainer.train()  # Just works
```

## What It Fixes

| Issue | Error | Affected Models |
|-------|-------|-----------------|
| **Safetensors metadata** | `'NoneType' has no attribute 'get'` | Any model with metadata-less safetensors |
| **LoRA dtype cast** | `unsigned char != c10::Half` | Any 4-bit model with LoRA |
| **MoE dtype mismatch** | `index_add_(): BFloat16 and Float` | Nemotron, Mixtral, any MoE |
| **Fused kernel bypass** | `shapes cannot be multiplied` | Mamba, Nemotron-H, hybrid architectures |

## Install

```bash
pip install qpatch
```

## Individual Patches

Apply only what you need:

```python
import qpatch

qpatch.patch_safetensors_metadata()   # Fix None metadata in safetensors
qpatch.patch_lora_dtype_cast()        # Cast uint8 inputs to float16
qpatch.patch_moe_dtype_mismatch()     # Auto-cast MoE index_add_ dtypes
qpatch.patch_fused_kernel_bypass()    # Skip fused kernels for quantized models
```

## GPU Architecture Notes

- **Volta (V100, GV100):** Use `qpatch.patch_all(compute_dtype=torch.float16)` — Volta does not support bf16
- **Ampere+ (A100, H100):** Use `qpatch.patch_all(compute_dtype=torch.bfloat16)`

## How It Works

`qpatch` applies targeted monkey-patches at import time:

1. **`patch_safetensors_metadata`** — Wraps `transformers.modeling_utils.load_state_dict` to detect None metadata and fall back to `safetensors.torch.load_file`
2. **`patch_lora_dtype_cast`** — Wraps `peft.tuners.lora.bnb.Linear4bit.forward` to cast uint8 inputs to the compute dtype
3. **`patch_moe_dtype_mismatch`** — Wraps `torch.Tensor.index_add_` to auto-cast source tensors when dtypes mismatch
4. **`patch_fused_kernel_bypass`** — Scans HuggingFace's cached model code and replaces fused-path conditions with `if False:` to force the quantization-safe slow path

All patches are **idempotent** — calling `patch_all()` multiple times is safe.

## Tested With

- Nemotron-3-Nano-30B (Mamba hybrid)
- Transformers 4.46–5.3
- PEFT 0.12–0.18
- bitsandbytes 0.42–0.49
- PyTorch 2.5–2.10
- CUDA 12.2–12.8
- Volta (GV100), Pascal (P100), Ampere (A100)

## License

MIT

## Citation

```bibtex
@software{bond2026qpatch,
  author = {Bond, Andrew H.},
  title = {qpatch: Automatic fixes for quantized model fine-tuning},
  year = {2026},
  url = {https://github.com/ahb-sjsu/qpatch},
}
```
