Metadata-Version: 2.4
Name: rex-framework
Version: 0.1.4
Summary: Remote Executable eXecution — inference with remotely-stored model weights
Author: Rohan R
License-Expression: Apache-2.0
Project-URL: Repository, https://github.com/rotsl/rex-framework.git
Project-URL: Documentation, https://rotsl.github.io/rex-framework/
Project-URL: Issues, https://github.com/rotsl/rex-framework/issues
Keywords: ml,inference,remote-weights,streaming,pytorch
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: <3.14,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: safetensors>=0.4.0
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: pydantic>=2.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: lz4>=4.3.0
Requires-Dist: zstandard>=0.21.0
Requires-Dist: click>=8.1.0
Requires-Dist: rich>=13.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: structlog>=23.0.0
Requires-Dist: huggingface_hub>=0.23.0
Requires-Dist: jsonschema>=4.21.0
Provides-Extra: pytorch
Requires-Dist: torch>=2.0.0; python_version < "3.14" and extra == "pytorch"
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-timeout>=2.1.0; extra == "dev"
Requires-Dist: mypy>=1.5.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: pre-commit>=3.5.0; extra == "dev"
Requires-Dist: mkdocs>=1.5.0; extra == "dev"
Requires-Dist: mkdocs-material>=9.4.0; extra == "dev"
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == "dev"
Provides-Extra: google-drive
Requires-Dist: google-api-python-client>=2.100.0; extra == "google-drive"
Requires-Dist: google-auth-oauthlib>=1.1.0; extra == "google-drive"
Provides-Extra: onedrive
Requires-Dist: msal>=1.24.0; extra == "onedrive"
Provides-Extra: bench
Requires-Dist: matplotlib>=3.7.0; extra == "bench"
Requires-Dist: pandas>=2.1.0; extra == "bench"
Requires-Dist: jinja2>=3.1.0; extra == "bench"
Provides-Extra: jax
Requires-Dist: jax>=0.4.20; extra == "jax"
Provides-Extra: tensorflow
Requires-Dist: tensorflow>=2.15.0; extra == "tensorflow"
Requires-Dist: h5py>=3.10.0; extra == "tensorflow"
Provides-Extra: all
Requires-Dist: rex-framework[bench,dev,google-drive,jax,onedrive,pytorch,tensorflow]; extra == "all"
Dynamic: license-file

# Rex Framework Package Overview

[![PyPI Downloads](https://static.pepy.tech/personalized-badge/rex-framework?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/rex-framework)

Rex Framework enables inference with remotely stored model weights without downloading full model checkpoints to local storage. Only the chunks needed for a given inference pass are fetched; the full model never resides in local memory or on disk.

Package intent:
- Primary: enable end users to run Rex for conversion, serving, and inference workflows.
- Secondary: support validation-oriented usage in CI and application test environments.

This package is intended for:
- Cloud-first inference where model chunks are fetched on demand.
- Memory-bounded environments where full checkpoint residency is undesirable.
- Notebook workflows, including Kaggle and Google Colab.

---

## What You Get In This Package

- Python API for loading Rex manifests and running inference.
- CLI tools for conversion, validation, inspection, serving, benchmarking, and demo runs.
- Optional extras for PyTorch, cloud storage integrations, and benchmark tooling.
- The package is designed for runtime use; the repository's full test suite is not shipped in the PyPI artifacts.

---

## Install

**Minimal package** (no PyTorch, useful for manifest validation and storage testing):

```bash
pip install rex-framework
```

**Recommended for real inference workloads:**

```bash
pip install "rex-framework[pytorch]"
```

**With all optional features** (cloud storage backends, benchmarking tools):

```bash
pip install "rex-framework[all]"
```

**Available extras:**

| Extra | What it adds |
|---|---|
| `pytorch` | `torch>=2.0.0` for inference |
| `google-drive` | Google Drive storage backend |
| `onedrive` | OneDrive storage backend |
| `bench` | Benchmarking and profiling tools |
| `all` | All of the above |

**Python compatibility notes:**
- `numpy` is auto-installed with the package (no separate install needed).
- The `pytorch` extra supports Python `3.10` to `3.13`.
- PyTorch wheels are not published on PyPI for Python `3.14`.
- On some platforms (for example, macOS x86_64), Python `3.13` may still lack compatible `torch` wheels. Use Python `3.11` in that case.

**Verify your install:**

```bash
python -c "import torch, rex; print(torch.__version__); print(rex.__version__)"
```

---

## How Rex Finds Your Model: Manifest and Chunk Paths

Rex does not load a model from a single checkpoint file. Instead it reads a **manifest** (a JSON file describing chunk locations, hashes, and metadata) and fetches individual chunks on demand from a **base URL**. Understanding how to point Rex at your model is essential for it to work.

### What a Rex manifest is

A manifest is a JSON file (`manifest.json`) generated by `rex-convert`. It contains:
- Model metadata (architecture, dtype, total size).
- A list of chunks: each chunk has a relative file path, byte offset, size, and SHA-256 hash.
- The expected base URL where chunks are hosted.

Chunks are served separately (e.g., as files in a directory) and fetched via HTTP Range requests. You do **not** need to host a special server — any server that supports `Range` headers works (nginx, S3, Google Drive direct links, OneDrive, or `rex-serve`).

### Step 1 — Convert your model to Rex format

```bash
rex-convert /path/to/model.pt \
  --output ./rex_output \
  --framework pytorch \
  --model-id my-model
```

This produces:
```
rex_output/
  manifest.json        ← the manifest you will point load_model at
  weights/
    chunk_000.bin
    chunk_001.bin
    ...
```

### Step 2 — Host the chunk files

**Option A — local HTTP server (for testing):**

```bash
rex-serve --dir ./rex_output/weights --port 8080
```

Chunks are now reachable at `http://localhost:8080/chunk_000.bin`, etc.

**Option B — any static HTTP host:**

Upload `rex_output/weights/` to any static host (nginx, S3, Cloudflare R2, GitHub Releases, Google Drive folder with public sharing). Note the base URL.

### Step 3 — Point `load_model` at the manifest

```python
from rex.api.config import RexConfig
from rex.api.load import load_model

config = RexConfig()
config.storage.base_url = "http://localhost:8080"   # ← where chunks are hosted

runtime = load_model("./rex_output/manifest.json", config=config)
```

`base_url` tells Rex how to resolve relative chunk paths from the manifest. Every chunk path in `manifest.json` is appended to `base_url` when fetching.

**Remote manifest (manifest itself is also hosted):**

```python
config.storage.base_url = "https://my-host.example.com/weights"
runtime = load_model("https://my-host.example.com/weights/manifest.json", config=config)
```

**Environment variable alternative:**

```bash
export REX_STORAGE_URL=https://my-host.example.com/weights
python your_script.py
```

### Step 4 — Run inference

```python
import numpy as np
from rex.api.generate import run_inference_sync

input_data = np.random.randn(1, 768).astype(np.float32)
output, metrics = run_inference_sync("./rex_output/manifest.json", input_data)
print(f"Inference time: {metrics.total_time_ms:.1f} ms")
```

---

## Storage Backends

Rex supports multiple storage backends. Set `config.storage.base_url` to the appropriate URL scheme:

| Backend | URL format | Extra required |
|---|---|---|
| Local HTTP / `rex-serve` | `http://localhost:8080` | none |
| Remote HTTP/HTTPS | `https://example.com/weights` | none |
| Google Drive | `gdrive://folder-id` | `google-drive` |
| OneDrive | `onedrive://drive-id/path` | `onedrive` |
| iCloud | `icloud://path/to/weights` | none |
| Local filesystem | `file:///abs/path/to/weights` | none |

**Authenticated endpoints** (e.g., private S3 or token-gated APIs):

```python
config.storage.auth_token = "Bearer YOUR_TOKEN"
```

Or via environment variable:

```bash
export REX_AUTH_TOKEN=Bearer YOUR_TOKEN
```

---

## Notebook Usage — Kaggle

Kaggle notebooks run on isolated kernels with internet access. The recommended pattern is to convert your model beforehand, host the chunks somewhere reachable (HTTPS URL, Google Drive public folder, or a Kaggle Dataset), then install Rex and load from that URL.

### Install in a Kaggle cell

```python
# Cell 1 — install
!pip install "rex-framework[pytorch]" -q
import rex, torch
print(rex.__version__, torch.__version__)
```

### Load from an HTTPS host

```python
# Cell 2 — configure and load
from rex.api.config import RexConfig
from rex.api.load import load_model

MANIFEST_URL = "https://your-static-host.com/rex_output/manifest.json"
CHUNKS_BASE_URL = "https://your-static-host.com/rex_output/weights"

config = RexConfig()
config.storage.base_url = CHUNKS_BASE_URL
config.cache.max_memory_cache_bytes = 512 * 1024 * 1024  # 512 MB limit

runtime = load_model(MANIFEST_URL, config=config)
```

### Load from a Kaggle Dataset

Upload your `rex_output/` directory as a Kaggle Dataset. Kaggle mounts datasets at `/kaggle/input/<dataset-name>/`.

```python
# Cell 2 — load from Kaggle Dataset mount
from rex.api.config import RexConfig
from rex.api.load import load_model

MANIFEST_PATH = "/kaggle/input/my-rex-model/manifest.json"
CHUNKS_BASE_URL = "file:///kaggle/input/my-rex-model/weights"

config = RexConfig()
config.storage.base_url = CHUNKS_BASE_URL

runtime = load_model(MANIFEST_PATH, config=config)
```

### Add Kaggle Secrets for authenticated endpoints

```python
from kaggle_secrets import UserSecretsClient

secrets = UserSecretsClient()
token = secrets.get_secret("REX_AUTH_TOKEN")

config.storage.auth_token = f"Bearer {token}"
```

### Run inference on Kaggle

```python
# Cell 3 — inference
import numpy as np
from rex.api.generate import run_inference_sync

input_data = np.random.randn(1, 768).astype(np.float32)
output, metrics = run_inference_sync(MANIFEST_PATH, input_data)
print(f"Output shape: {output.shape}")
print(f"Inference time: {metrics.total_time_ms:.1f} ms")
```

---

## Notebook Usage — Google Colab

Google Colab provides a transient VM with internet access. The same manifest/chunk remote loading pattern applies. Colab T4 or A100 GPUs can be used if your Rex model targets CUDA.

### Install in Colab

```python
# Cell 1 — install
!pip install "rex-framework[pytorch]" -q
import rex, torch
print(rex.__version__, torch.__version__)
```

### Load from an HTTPS host

```python
# Cell 2 — configure and load
from rex.api.config import RexConfig
from rex.api.load import load_model

MANIFEST_URL = "https://your-static-host.com/rex_output/manifest.json"
CHUNKS_BASE_URL = "https://your-static-host.com/rex_output/weights"

config = RexConfig()
config.storage.base_url = CHUNKS_BASE_URL
config.cache.max_memory_cache_bytes = 1 * 1024 * 1024 * 1024  # 1 GB (Colab has more RAM)
config.scheduler.enable_prefetch = True
config.scheduler.prefetch_window = 4

runtime = load_model(MANIFEST_URL, config=config)
```

### Load from Google Drive in Colab

If you uploaded your `rex_output/` to your Google Drive, mount it and point Rex at the local path:

```python
# Cell 2a — mount Google Drive
from google.colab import drive
drive.mount("/content/drive")

# Cell 2b — load from mounted Drive path
from rex.api.config import RexConfig
from rex.api.load import load_model

MANIFEST_PATH = "/content/drive/MyDrive/rex_output/manifest.json"
CHUNKS_BASE_URL = "file:///content/drive/MyDrive/rex_output/weights"

config = RexConfig()
config.storage.base_url = CHUNKS_BASE_URL

runtime = load_model(MANIFEST_PATH, config=config)
```

### Use Colab Secrets for tokens

```python
from google.colab import userdata

config.storage.auth_token = f"Bearer {userdata.get('REX_AUTH_TOKEN')}"
```

### GPU inference in Colab

Rex will use the available CUDA device automatically when PyTorch detects a GPU. Confirm your runtime type is set to **T4 GPU** or **A100** in Colab's Runtime menu.

```python
import torch
print("CUDA available:", torch.cuda.is_available())
print("Device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU")
```

---

## Core Principle

Rex executes with bounded local residency by streaming model chunks from remote storage through HTTP range fetches and cache-aware scheduling. At no point does the full model need to exist locally.

---

## Feature Controls (Quick Reference)

Control Rex behaviour through `RexConfig`:

```python
from rex.api.config import RexConfig

config = RexConfig()

# How much local memory the cache can use
config.cache.max_memory_cache_bytes = 512 * 1024 * 1024

# Fraction of the full model allowed locally at any time (Rex invariant)
config.cache.max_local_fraction_of_model = 0.4

# Cache eviction policy: lru | lfu | weighted_utility
config.cache.policy = "weighted_utility"

# Prefetch ahead of current execution
config.scheduler.enable_prefetch = True
config.scheduler.prefetch_window = 4

# Execution planning mode: graph | sequential
config.scheduler.scheduler_mode = "graph"

# Storage concurrency
config.storage.max_concurrent_fetches = 4
config.storage.adaptive_concurrency = True

# Logging
config.observability.log_level = "INFO"   # DEBUG | INFO | WARNING | ERROR
config.observability.log_format = "console"  # console | json | quiet
```

For all available config fields and preset profiles (debug, throughput-oriented), see:
- [https://rotsl.github.io/rex-framework/](https://rotsl.github.io/rex-framework/)

---

## CLI Quick Reference

| Command | Purpose |
|---|---|
| `rex-convert` | Convert a PyTorch checkpoint to Rex format |
| `rex-serve` | Serve chunk files with HTTP Range support |
| `rex-validate` | Validate a manifest file |
| `rex-inspect` | Inspect a manifest (verbose chunk listing) |
| `rex-benchmark` | Run latency/throughput benchmark |
| `rex-run-demo` | End-to-end demo run |

---

## Package Guide

For full CLI and API reference, preset configuration profiles, and environment variable documentation, see:
- [https://rotsl.github.io/rex-framework/](https://rotsl.github.io/rex-framework/)

For repository development details and architecture notes, use the repository documentation instead of package docs.
