Metadata-Version: 2.4
Name: anydeploy
Version: 0.2.2
Summary: CLI tool and library to export ML models to production formats and containerize them with Docker
Project-URL: Homepage, https://www.nrl.ai
Project-URL: Repository, https://github.com/vietanhdev/anydeploy
Project-URL: Documentation, https://github.com/vietanhdev/anydeploy#readme
Project-URL: Issues, https://github.com/vietanhdev/anydeploy/issues
Author-email: Viet-Anh Nguyen <vietanh.dev@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: deployment,docker,machine-learning,model-serving,onnx,tflite,torchscript
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Requires-Dist: click>=8.0
Requires-Dist: numpy>=1.20
Requires-Dist: pyyaml>=6.0
Provides-Extra: all
Requires-Dist: fastapi>=0.68; extra == 'all'
Requires-Dist: onnx>=1.10; extra == 'all'
Requires-Dist: onnxruntime>=1.10; extra == 'all'
Requires-Dist: tensorflow>=2.5; extra == 'all'
Requires-Dist: torch>=1.9; extra == 'all'
Requires-Dist: uvicorn>=0.15; extra == 'all'
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: onnx
Requires-Dist: onnx>=1.10; extra == 'onnx'
Requires-Dist: onnxruntime>=1.10; extra == 'onnx'
Provides-Extra: serve
Requires-Dist: fastapi>=0.68; extra == 'serve'
Requires-Dist: uvicorn>=0.15; extra == 'serve'
Provides-Extra: tflite
Requires-Dist: tensorflow>=2.5; extra == 'tflite'
Provides-Extra: torch
Requires-Dist: torch>=1.9; extra == 'torch'
Description-Content-Type: text/markdown

<h1 align="center">anydeploy</h1>
<p align="center"><em>Deploy ML models anywhere</em></p>

![PyPI](https://img.shields.io/pypi/v/anydeploy)
![Python](https://img.shields.io/pypi/pyversions/anydeploy)
![License](https://img.shields.io/pypi/l/anydeploy)

**Export ML models to production formats (ONNX, TFLite, TorchScript) and deploy them locally or at the edge.**

`anydeploy` makes model deployment easy. Convert your trained models to optimized inference formats, benchmark performance, validate correctness, generate serving code, and containerize everything -- all from a single CLI or Python API.

**Edge-first deployment.** Supports ONNX Runtime (CPU/GPU/edge), TFLite (mobile/edge), and llama.cpp (local LLM serving). All deployment targets work completely offline.

Built and maintained by [Viet-Anh Nguyen](https://github.com/vietanhdev) at [NRL.ai](https://www.nrl.ai).

## Installation

```bash
# Core (CLI + config + benchmarking)
pip install anydeploy

# With specific framework support
pip install anydeploy[torch]      # PyTorch + TorchScript
pip install anydeploy[onnx]       # ONNX + ONNX Runtime
pip install anydeploy[tflite]     # TensorFlow Lite
pip install anydeploy[serve]      # FastAPI serving

# Everything
pip install anydeploy[all]
```

## Quick Start

### CLI

```bash
# Export a PyTorch model to ONNX
anydeploy export model.pt --format onnx --input-shape 1,3,224,224

# Export to TFLite
anydeploy export model.pt --format tflite --input-shape 1,3,224,224

# Benchmark an exported model
anydeploy benchmark model.onnx --runs 100

# Serve a model with FastAPI
anydeploy serve model.onnx --backend fastapi --port 8000

# Generate a Docker container for deployment
anydeploy dockerize model.onnx --base python:3.11-slim
```

### Python API

```python
import anydeploy

# Export a model
anydeploy.export(model, format="onnx", input_shape=(1, 3, 224, 224))

# Benchmark performance
result = anydeploy.benchmark("model.onnx", runs=100)
print(f"Mean latency: {result.mean_latency_ms:.2f} ms")
print(f"P95 latency:  {result.p95_latency_ms:.2f} ms")
print(f"Throughput:   {result.throughput:.1f} inferences/sec")

# Validate exported model against original
report = anydeploy.validate(original_model, "model.onnx", test_input)
print(f"Max difference: {report.max_diff}")
print(f"Passed: {report.passed}")

# Generate Dockerfile and serving code
from anydeploy.config import DockerConfig
docker_cfg = DockerConfig(base_image="python:3.11-slim")
anydeploy.dockerize("model.onnx", docker_cfg)

# Register a custom exporter
from anydeploy.export.base import BaseExporter
class MyExporter(BaseExporter):
    def export(self, model, output_path, config=None):
        ...
anydeploy.register_exporter("myformat", MyExporter)
```

## Export Format Comparison

| Format      | Framework    | Hardware        | Optimization     | File Size |
|-------------|-------------|-----------------|------------------|-----------|
| ONNX        | Any (via ONNX Runtime) | CPU, GPU, Edge  | Graph optimization | Medium    |
| TFLite      | TensorFlow  | Mobile, Edge    | Quantization     | Small     |
| TorchScript | PyTorch     | CPU, GPU        | JIT compilation  | Large     |

## Serving

`anydeploy` generates production-ready serving code for multiple backends:

```bash
# FastAPI server for ONNX/TFLite/TorchScript models
anydeploy serve model.onnx --backend fastapi --port 8000

# llama.cpp server for GGUF language models (edge LLM deployment)
anydeploy serve model.gguf --backend llamacpp --port 8080
```

### FastAPI Backend

Creates a FastAPI application with:
- `/predict` endpoint accepting JSON or binary input
- `/health` health check endpoint
- Automatic input validation
- Configurable batch size

### llama.cpp Backend

Creates deployment scripts for serving GGUF language models locally:
- Shell script to launch llama.cpp server
- Dockerfile for containerized LLM serving
- OpenAI-compatible `/v1/chat/completions` endpoint
- Works on CPU, GPU, and edge devices

## Docker Deployment

Generate a complete Docker setup for your model:

```bash
anydeploy dockerize model.onnx --base python:3.11-slim --port 8000
```

This creates:
- `Dockerfile` with optimized layers
- `serve.py` FastAPI application
- `requirements.txt` with pinned dependencies

## Extensibility

`anydeploy` uses a plugin architecture. You can register custom exporters and serving backends:

```python
import anydeploy
from anydeploy.export.base import BaseExporter

class CoreMLExporter(BaseExporter):
    format_name = "coreml"

    def export(self, model, output_path, config=None):
        # Your export logic
        ...

    def validate_model(self, model):
        return True

anydeploy.register_exporter("coreml", CoreMLExporter)
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for details on adding new exporters and backends.

## Local-First / Edge AI

This package is designed for edge and local deployment. All export formats
(ONNX, TFLite, TorchScript) produce models that run completely offline.
The llama.cpp backend enables local LLM serving without any cloud dependencies.

```bash
# Export for edge deployment
anydeploy export model.pt --format onnx       # ONNX Runtime (CPU/GPU/edge)
anydeploy export model.pt --format tflite      # TFLite (mobile/edge)

# Serve an LLM locally
anydeploy serve model.gguf --backend llamacpp
```

## Contributing

Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## License

MIT License. See [LICENSE](LICENSE) for details.

## Links

- [NRL.ai](https://www.nrl.ai)
- [GitHub](https://github.com/vietanhdev/anydeploy)
- [PyPI](https://pypi.org/project/anydeploy/)
