Metadata-Version: 2.4
Name: coremateai
Version: 0.1.1
Summary: Torch and CUDA helper toolkit for AI training and inference workflows.
Author: hustle
License-Expression: MIT
Keywords: python,ai,cuda,torch,machine-learning,training,inference
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Provides-Extra: ai
Requires-Dist: numpy>=1.24; extra == "ai"
Requires-Dist: torch>=2.2; extra == "ai"

# coremate

CoreMate is a lightweight Python toolkit focused on Torch and CUDA optimization for AI workflows.

PyPI release page:
`https://pypi.org/project/coremateai/0.1.1/`

## Installation

```bash
pip install coremateai==0.1.1
```

Optional extras:

```bash
pip install "coremateai[ai]"
```

Install PyTorch separately (matching your CUDA runtime):

```bash
pip install torch
```

## What It Does

- Detects AI stack details (`torch`, CUDA, GPU, driver).
- Recommends training presets by task (`cv`, `nlp`, `llm`, etc.).
- Builds full training plans (`build_training_plan`) with LR, batch and accumulation heuristics.
- Applies reproducibility helpers (`set_global_seed`).
- Tunes Torch runtime (`cudnn`, TF32, matmul precision, threads).
- Applies CUDA allocator/env defaults.
- Provides one-shot optimization (`optimize_torch_cuda`).

## Python Usage

```python
from coremate import (
    detect_ai_stack,
    recommend_training_preset,
    build_training_plan,
    set_global_seed,
    tune_torch_runtime,
    apply_cuda_env_defaults,
    optimize_torch_cuda,
)

stack = detect_ai_stack()
print(stack["cuda_available"], stack["torch_version"])

preset = recommend_training_preset(task="llm", aggressive=True, hardware=stack)
print(preset.batch_size, preset.precision)

plan = build_training_plan(
    task="llm",
    model_scale="base",
    target_global_batch_size=64,
    aggressive=True,
    hardware=stack,
)
print(plan.global_batch_size, plan.learning_rate)

set_global_seed(42, deterministic=True)
apply_cuda_env_defaults(max_split_size_mb=256, expandable_segments=True)
tune_torch_runtime(seed=42, deterministic=True, benchmark=True, allow_tf32=True)

result = optimize_torch_cuda(
    task="llm",
    model_scale="base",
    target_global_batch_size=64,
    aggressive=True,
    seed=42,
    deterministic=True,
)
print(result["plan"]["global_batch_size"], result["tune_error"])
```

## CLI Usage

```bash
coremate ai report
coremate ai recommend --task llm --aggressive
coremate ai plan --task llm --model-scale base --target-global-batch 64 --aggressive
coremate ai seed --seed 42 --deterministic
coremate ai env --max-split-size-mb 256 --expandable-segments
coremate ai tune --seed 42 --deterministic --benchmark --allow-tf32
coremate ai optimize --task llm --model-scale base --target-global-batch 64 --aggressive --seed 42 --deterministic
```

## Testing

```bash
python -m pytest
```
