Metadata-Version: 2.4
Name: synth-ai
Version: 0.4.3
Summary: Serverless Posttraining for Agents - Core AI functionality and tracing
Author-email: Synth AI <josh@usesynth.ai>
License: MIT
Project-URL: Homepage, https://github.com/synth-laboratories/synth-ai
Project-URL: Repository, https://github.com/synth-laboratories/synth-ai
Project-URL: Issues, https://github.com/synth-laboratories/synth-ai/issues
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv>=1.0.1
Requires-Dist: requests>=2.32.3
Requires-Dist: tqdm>=4.66.4
Requires-Dist: typing_extensions>=4.0.0
Requires-Dist: rich>=13.9.0
Requires-Dist: openai>=1.99.0
Requires-Dist: anthropic>=0.42.0
Requires-Dist: langfuse<3.0.0,>=2.53.9
Requires-Dist: opentelemetry-api>=1.26.0
Requires-Dist: opentelemetry-sdk>=1.26.0
Requires-Dist: groq>=0.30.0
Requires-Dist: google-genai>=1.26.0
Requires-Dist: together>=1.5.21
Requires-Dist: mistralai>=1.9.2
Requires-Dist: fastapi>=0.115.12
Requires-Dist: uvicorn>=0.34.2
Requires-Dist: numpy>=2.2.3
Requires-Dist: networkx>=3.4.2
Requires-Dist: sqlalchemy>=2.0.42
Requires-Dist: celery>=5.4.0
Requires-Dist: redis>=6.2.0
Requires-Dist: aiosqlite>=0.21.0
Requires-Dist: libsql>=0.1.8
Requires-Dist: pynacl>=1.5.0
Requires-Dist: click<8.2,>=8.1.7
Requires-Dist: aiohttp>=3.8.0
Requires-Dist: httpx>=0.28.1
Requires-Dist: modal<2.0.0,>=1.1.4
Requires-Dist: docker>=7.0.0
Requires-Dist: mcp>=1.21.0
Requires-Dist: ruff>=0.12.9
Requires-Dist: tomli_w>=1.0.0
Requires-Dist: dspy>=3.0.4
Requires-Dist: setuptools>=80.9.0
Requires-Dist: gymnasium>=0.26.2
Requires-Dist: gepa>=0.0.17
Requires-Dist: datasets>=4.0.0
Provides-Extra: dev
Requires-Dist: build>=1.2.2.post1; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Requires-Dist: keyring>=24.0.0; extra == "dev"
Requires-Dist: pytest>=8.3.3; extra == "dev"
Requires-Dist: pytest-xdist>=3.6.1; extra == "dev"
Requires-Dist: pytest-timeout>=2.3.1; extra == "dev"
Requires-Dist: pytest-asyncio>=0.24.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pyright>=1.1.350; extra == "dev"
Requires-Dist: coverage[toml]>=7.3.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: papermill>=2.6.0; extra == "dev"
Requires-Dist: nest_asyncio>=1.6.0; extra == "dev"
Provides-Extra: research
Requires-Dist: crafter>=1.8.3; extra == "research"
Requires-Dist: datasets>=4.0.0; extra == "research"
Provides-Extra: swe
Requires-Dist: morphcloud>=0.1.3; extra == "swe"
Requires-Dist: swebench>=2.3.0; extra == "swe"
Provides-Extra: all
Requires-Dist: crafter>=1.8.3; extra == "all"
Requires-Dist: datasets>=4.0.0; extra == "all"
Requires-Dist: morphcloud>=0.1.3; extra == "all"
Requires-Dist: swebench>=2.3.0; extra == "all"
Requires-Dist: pyboy>=2.6.0; extra == "all"
Requires-Dist: transformers>=4.56.1; extra == "all"
Requires-Dist: redis>=6.2.0; extra == "all"
Provides-Extra: analytics
Requires-Dist: pandas>=2.2.3; extra == "analytics"
Dynamic: license-file

# Synth

[![Python](https://img.shields.io/badge/python-3.11+-blue)](https://www.python.org/)
[![PyPI](https://img.shields.io/pypi/v/synth-ai.svg)](https://pypi.org/project/synth-ai/)
[![PyPI Main](https://img.shields.io/badge/main-0.4.1-blue)](https://pypi.org/project/synth-ai/0.4.1/)
[![PyPI Nightly](https://img.shields.io/badge/nightly-0.4.0-orange)](https://pypi.org/project/synth-ai/)
[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
![Coverage](https://img.shields.io/badge/coverage-28.65%25-yellow)
![Tests](https://img.shields.io/badge/tests-847%20passing-brightgreen)

Serverless Posttraining APIs for Developers

<p align="center">
  <picture align="center">
    <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/synth-laboratories/synth-ai/main/assets/langprobe_v2_dark.png">
    <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/synth-laboratories/synth-ai/main/assets/langprobe_v2_light.png">
    <img alt="Shows a bar chart comparing prompt optimization performance across Synth GEPA, Synth MIPRO, GEPA (lib), DSPy MIPRO, and DSPy GEPA with baseline vs optimized." src="https://raw.githubusercontent.com/synth-laboratories/synth-ai/main/assets/langprobe_v2_light.png">
  </picture>
</p>

<p align="center">
  <i>Average accuracy on <a href="https://arxiv.org/abs/2502.20315">LangProBe</a> prompt optimization benchmarks.</i>
</p>

## Highlights

- 🚀 Train across sft, RL, and prompt opt by standing up a single cloudflared Fastapi wrapper around your code. No production code churn.
- ⚡️ Parallelize training and achieve 80% GPU util. via PipelineRL
- 🗂️ Train prompts and models across multiple experiments
- 🛠️ Spin up experiment queues and datastores locally for dev work
- 🔩 Run serverless training via cli or programmatically
- 🏢 Scales gpu-based model training to 64 H100s seemlessly
- 💾 Use GEPA-calibrated verifiers for fast, accurate rubric scoring
- 🖥️ Supports HTTP-based training across all programming languages
- 🤖 CLI utilities tuned for use with Claude Code, Codex, Opencode

## Getting Started

```bash
# Use with OpenAI Codex
uvx synth-ai codex
```

```bash
# Use with Opencode
uvx synth-ai opencode
```

Synth is maintained by devs behind the [MIPROv2](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=jauNVA8AAAAJ&citation_for_view=jauNVA8AAAAJ:u5HHmVD_uO8C) prompt optimizer.

## Documentation

**[docs.usesynth.ai](https://docs.usesynth.ai)**

## In-Process Runner (SDK)

Run GEPA/MIPRO/RL jobs against a tunneled task app without the CLI:

```python
import asyncio
import os
from synth_ai.sdk.task import run_in_process_job

result = asyncio.run(
    run_in_process_job(
        job_type="prompt_learning",
        config_path="configs/style_matching_gepa.toml",
        task_app_path="task_apps/style_matching_task_app.py",
        overrides={"prompt_learning.gepa.rollout.budget": 4},
        backend_url=os.getenv("TARGET_BACKEND_BASE_URL"),  # resolves envs automatically
    )
)
print(result.job_id, result.status.get("status"))
```

## Zero-Shot Verifiers (SDK)

Run a built-in verifier graph with rubric criteria passed at runtime:

```python
import asyncio
import os
from synth_ai.sdk.graphs import VerifierClient

async def run_verifier():
    client = VerifierClient(
        base_url=os.environ["SYNTH_BACKEND_BASE"],
        api_key=os.environ["SYNTH_API_KEY"],
    )
    result = await client.evaluate(
        job_id="zero_shot_verifier_single",
        session_trace={"session_id": "s", "event_history": []},
        rubric={
            "event": [{"id": "accuracy", "weight": 1.0, "description": "Correctness"}],
            "outcome": [{"id": "task_completion", "weight": 1.0, "description": "Completed task"}],
        },
        options={"event": True, "outcome": True, "model": "gpt-5-nano"},
        policy_name="my_policy",
        task_app_id="my_task",
    )
    return result

asyncio.run(run_verifier())
```

You can also call arbitrary graphs directly:

```python
from synth_ai.sdk.graphs import GraphCompletionsClient

client = GraphCompletionsClient(base_url="https://api.usesynth.ai", api_key="...")
resp = await client.run(
    graph={"kind": "zero_shot", "verifier_type": "zero_shot_verifier_mapreduce"},
    input_data={"session_trace": {"session_id": "s", "event_history": []}, "rubric": {"event": [], "outcome": []}},
)
```

## GraphGen: Train Custom Verifier and RLM Graphs

Train custom verifier and RLM graphs using GraphGen:

```python
from synth_ai.sdk.api.train.graphgen import GraphGenJob

# Train a verifier graph (judge/scorer)
verifier_job = GraphGenJob.from_dataset(
    dataset="verifier_dataset.json",
    graph_type="verifier",
    policy_models=["gpt-4.1"],
    proposer_effort="medium",  # Use "medium" (gpt-4.1) or "high" (gpt-5.2)
    rollout_budget=200,
)
verifier_job.submit()
result = verifier_job.stream_until_complete(timeout=3600.0)

# Run inference with trained verifier
judgment = verifier_job.run_verifier(
    session_trace=my_trace,
    context={"rubric": my_rubric},
)
print(f"Score: {judgment.score}, Reasoning: {judgment.reasoning}")
```

```python
# Train an RLM graph (massive context via tools)
rlm_job = GraphGenJob.from_dataset(
    dataset="rlm_dataset.json",
    graph_type="rlm",
    configured_tools=[
        {"name": "materialize_context", "kind": "rlm_materialize", "stateful": True},
        {"name": "local_grep", "kind": "rlm_local_grep", "stateful": False},
        {"name": "codex_exec", "kind": "daytona_exec", "stateful": True},
    ],
    policy_models=["gpt-4.1"],
    proposer_effort="medium",
    rollout_budget=100,
)
rlm_job.submit()
result = rlm_job.stream_until_complete(timeout=3600.0)

# Run inference with trained RLM graph
output = rlm_job.run_inference({"query": "Find relevant sections", "context": large_document})
```

**Graph Types:**
- **`verifier`**: Trains a judge/scorer that evaluates traces and returns structured rewards
- **`rlm`**: Trains a graph optimized for massive contexts (1M+ tokens) using tool-based search
- **`policy`**: Trains a standard input→output graph (default)

**RLM Tools:**
- `materialize_context` - Store input fields for fast searching (~1ms local)
- `local_grep` - Regex search on materialized content (~1ms)
- `local_search` - Substring search (~1ms)
- `query_lm` - Sub-LM calls for processing chunks
- `codex_exec` - Shell execution for complex operations

**When to use RLM:**
- Context exceeds ~100K tokens (too large for prompt)
- You need to search/filter large datasets
- RAG-style workflows over massive corpora
