Metadata-Version: 2.4
Name: layercode-gym
Version: 0.8.0
Summary: Utilities for simulating LayerCode voice agent conversations
Author: Jan Siml
Author-email: Jan Siml <49557684+svilupp@users.noreply.github.com>
License-Expression: MIT
License-File: LICENSE
Requires-Dist: logfire>=4.14.2
Requires-Dist: openai>=2.6.1
Requires-Dist: pydantic-ai>=1.9.1
Requires-Dist: pydantic>=2.10.0
Requires-Dist: pydub>=0.25.1
Requires-Dist: python-dotenv>=1.2.1
Requires-Dist: textprompts>=1.1.0
Requires-Dist: tqdm>=4.67.1
Requires-Dist: httpx>=0.27.2
Requires-Dist: websockets>=12.0
Requires-Dist: audioop-lts ; python_full_version >= '3.13'
Requires-Python: >=3.11
Project-URL: Bug Tracker, https://github.com/svilupp/layercode-gym/issues
Project-URL: Documentation, https://svilupp.github.io/layercode-gym
Project-URL: Homepage, https://github.com/svilupp/layercode-gym
Description-Content-Type: text/markdown

# LayerCode Gym

[![CI](https://github.com/svilupp/layercode-gym/actions/workflows/ci.yml/badge.svg)](https://github.com/svilupp/layercode-gym/actions/workflows/ci.yml)
[![Docs](https://github.com/svilupp/layercode-gym/actions/workflows/docs.yml/badge.svg)](https://github.com/svilupp/layercode-gym/actions/workflows/docs.yml)
[![Documentation](https://img.shields.io/badge/docs-latest-blue.svg)](https://svilupp.github.io/layercode-gym)

A testing toolkit for voice AI agents built on [Layercode.com](https://layercode.com). Quickly spin up a testing environment to run through hundreds of scenarios and understand how your agent will perform in production.

**Note:** This is an unofficial, community-maintained project.

Perfect for regression testing, load testing, and automated evaluation of your voice AI agents.

## Features

- **Three User Simulator Types**: Fixed text, pre-recorded audio, or AI-driven personas
- **Smart Wait Handling**: AI personas intelligently wait when assistants need processing time
- **Captured Analytics**: Full transcripts with TTFAB, latency stats, and audio recordings
- **LogFire Integration**: Real-time observability and debugging
- **Batch Testing**: Run hundreds of conversations concurrently
- **CLI & Python API**: Quick testing via CLI or programmatic control, plus `api-agents` CLI to swap webhook URLs for CI
- **LLM-as-Judge**: Bring your own quality evaluation with customizable criteria as a conversational hook
- **GitHub Actions Integration**: Automated CI/CD testing with parallel persona execution

See `examples/` for reference!

## Quick Start

**Prerequisites:** Backend server configured in [Layercode dashboard](https://dash.layercode.com).

No server yet? Launch one quickly:
```bash
uvx layercode-create-app run --tunnel --unsafe-update-webhook
# Displays tunnel URL to enter in Layercode dashboard
```
!! Caution: `--unsafe-update-webhook` automatically updates the webhook URL in the Layercode dashboard!

### CLI Quick Test (No Installation)

```bash
# Set environment
export SERVER_URL="http://localhost:8001"
export LAYERCODE_AGENT_ID="your_agent_id"

# Run instantly with uvx (no installation)
uvx layercode-gym run --text "Hello, I need help with my account"

# Multiple messages
uvx layercode-gym run --text "Hi" --text "Tell me more" --text "Goodbye"

# Audio file
uvx layercode-gym run --file recording.wav

# AI agent with persona
uvx layercode-gym run --agent \
  --persona-background "You are a frustrated customer" \
  --persona-intent "Cancel your subscription"
```

Run `uvx layercode-gym --help` to see available commands, or `uvx layercode-gym run --help` for all run options.

### Manage Agent Webhooks (for CI)

```bash
# List all agents
uvx layercode-gym api-agents list

# Get agent details (use --json for full pipeline config)
uvx layercode-gym api-agents get --agent-id ag-123

# Update webhook URL (useful for PR testing)
uvx layercode-gym api-agents update --agent-id ag-123 --webhook-url https://pr-backend.com/webhook
```

### Cloudflare Tunnel (for Local Development)

Quickly expose your local server to the internet with a Cloudflare tunnel. This is useful for testing webhooks without deploying your backend.

**Requires:** [cloudflared](https://developers.cloudflare.com/cloudflare-one/connections/connect-apps/install-and-setup/installation/) to be installed.

```bash
# Basic tunnel - displays URL to copy manually
uvx layercode-gym tunnel --port 8000

# Or specify a full URL directly
uvx layercode-gym tunnel --url http://localhost:8000

# Auto-update agent webhook (recommended for development)
uvx layercode-gym tunnel --port 8000 --unsafe-update-webhook

# Explicit agent ID (overrides LAYERCODE_AGENT_ID env var)
uvx layercode-gym tunnel --port 8000 --agent-id ag-123456 --unsafe-update-webhook
```

When using `--unsafe-update-webhook`:
1. The tunnel starts and gets a base URL (e.g., `https://random-words.trycloudflare.com`)
2. The agent path is appended to create the full webhook URL (e.g., `https://random-words.trycloudflare.com/api/agent`)
3. Your agent's webhook URL is automatically updated
4. When you stop the tunnel (Ctrl+C), the original webhook URL is restored

**Agent path resolution:** `--agent-path` flag → `LAYERCODE_AGENT_PATH` env var → path from existing webhook → default `/api/agent`

**Environment Variables:**
- `LAYERCODE_AGENT_ID` - Default agent ID for webhook updates
- `LAYERCODE_API_KEY` - API key for webhook updates (required for `--unsafe-update-webhook`)
- `LAYERCODE_AGENT_PATH` - Path to append to tunnel URL (default: extracted from existing webhook, or `/api/agent`)

> **Warning:** `--unsafe-update-webhook` modifies your agent's configuration. Only use with development/test agents, not production.

See [tunnel documentation](docs/tunnel.md) for more details.

### Python API

```bash
# Install
uv add layercode-gym

# Set environment
export SERVER_URL="http://localhost:8001"
export LAYERCODE_AGENT_ID="your_agent_id"
export OPENAI_API_KEY="sk-..."  # For TTS and AI personas
```

```python
from layercode_gym import LayercodeClient, UserSimulator

# Simple text messages
simulator = UserSimulator.from_text(
    messages=["Hello!", "Tell me about pricing", "Thank you"],
    send_as_text=True
)

client = LayercodeClient(simulator=simulator)
conversation_id = await client.run()
```

## Architecture

```
┌─────────────┐                    ┌──────────────┐
│  Your Test  │──1. Authorize──────▶│ Your Backend │
│    Code     │                     │   Server     │
└─────────────┘                     └──────────────┘
       │                                    │
       │                             2. Return
       │                           client_session_key
       │                                    │
       └──────3. Connect with key───────────┘
                      │
                      ▼
              ┌──────────────┐
              │  Layercode   │
              │   Platform   │
              └──────────────┘
```

**Flow:**
1. Client authorizes through YOUR backend server (`SERVER_URL`)
2. Backend returns `client_session_key` from LayerCode
3. Client connects to LayerCode WebSocket with that key

The client never hits LayerCode's API directly - it always goes through your backend first.

## User Simulators

Three types for different testing needs:

### 1. Fixed Text Messages
Fastest option, perfect for regression testing:
```python
simulator = UserSimulator.from_text(
    messages=["Hello", "Tell me more", "Goodbye"],
    send_as_text=True  # or False to use TTS
)
```

### 2. Pre-recorded Audio Files
Test transcription and audio handling:
```python
from pathlib import Path

simulator = UserSimulator.from_files(
    files=[Path("greeting.wav"), Path("question.wav")]
)
```

### 3. AI Agent Personas
Realistic, dynamic conversations using PydanticAI:
```python
from layercode_gym import Persona

simulator = UserSimulator.from_agent(
    persona=Persona(
        background_context="You are a 35-year-old small business owner",
        intent="You want to understand pricing and features"
    ),
    model="openai:gpt-5-mini",
    max_turns=5
)
```

## Examples

The `examples/` directory contains ready-to-run scripts:

- **01_text_messages.py** - Simple text conversation for quick testing
- **02_audio_file.py** - Stream pre-recorded audio to test transcription
- **03_agent_persona.py** - AI-driven user with dynamic responses
- **04_callbacks_judge.py** - CriteriaJudge for automated pass/fail evaluation
- **05_batch_evaluation.py** - Run multiple conversations concurrently
- **06_outdoor_shop_eval.py** - Custom data processor with domain-specific criteria
- **07_custom_judge.py** - Build your own judge with custom PydanticAI output types
- **08_long_running_task.py** - Testing agents with wait handling for slow operations

Run any example:
```bash
python examples/01_text_messages.py
```

See [full documentation](https://svilupp.github.io/layercode-gym/examples) for detailed explanations.

## LLM-as-Judge Evaluation

Evaluate conversations against pass/fail criteria using `CriteriaJudge`:

```python
from layercode_gym import CriteriaJudge, LayercodeClient, Settings

judge = CriteriaJudge(
    criteria=[
        "Did the agent answer all user questions?",
        "Was the agent polite and professional?",
        "Did the conversation flow naturally?"
    ],
    # Note: gpt-5-mini is fast/cheap for testing; use gpt-5 for production
    model="openai:gpt-5-mini"
)

async def on_end(log):
    result = await judge.evaluate(log)
    print(f"Overall: {'PASS' if result.overall_pass else 'FAIL'}")
    judge.save_results(result, log.conversation_id, Settings.load().output_root)

client = LayercodeClient(
    simulator=simulator,
    conversation_callback=on_end
)
```

Results saved to `conversations/<id>/judge_evaluation.json` with full evaluation metadata:

```json
{
  "schema_version": "1.0",
  "evaluated_at": "2025-12-05T13:15:41.124793+00:00",
  "model": "openai:gpt-5-mini",
  "criteria": [{"id": 1, "criterion": "Did the agent answer all user questions?"}],
  "additional_context": "Optional context provided to the judge",
  "judgment": {
    "criteria_results": [{"criterion_id": 1, "passed": true}],
    "overall_pass": true,
    "reasoning": "The agent answered all questions clearly..."
  },
  "results_summary": [{"id": 1, "criterion": "...", "passed": true}]
}
```

## Batch Testing

Run hundreds of conversations concurrently:

```python
import asyncio
from tqdm.asyncio import tqdm_asyncio

scenarios = ["Message 1", "Message 2", "Message 3"]
tasks = [run_conversation(msg) for msg in scenarios]

results = await tqdm_asyncio.gather(*tasks, desc="Running conversations")
```

See `examples/05_batch_evaluation.py` for the complete pattern.

## GitHub Actions CI/CD

Run automated tests in your CI pipeline with multiple personas in parallel:

```yaml
- uses: ./.github/actions/layercode-gym-test
  with:
    personas: |
      - background: You are a potential customer
        intent: Learn about pricing and features

      - background: You are a frustrated user
        intent: Get help with a problem
    judge-enabled: true
    judge-criteria: |
      - Did the agent provide clear and helpful responses?
    server-url: ${{ secrets.SERVER_URL }}
    layercode-agent-id: ${{ secrets.LAYERCODE_AGENT_ID }}
    openai-api-key: ${{ secrets.OPENAI_API_KEY }}
```

**Features:**
- Run multiple personas in parallel for maximum speed
- Automated quality evaluation with LLM judge
- Detailed artifacts with transcripts and audio recordings
- Optional LogFire observability integration

**Tip:** Use the `api-agents` CLI to update your agent's webhook URL for PR testing:
```bash
# Point agent to PR-specific backend before running tests
layercode-gym api-agents update --agent-id ag-123 --webhook-url https://pr-456.example.com/webhook

# Restore original after tests
layercode-gym api-agents update --agent-id ag-123 --webhook-url https://production.example.com/webhook
```

See [GitHub Actions documentation](docs/github-action.md) for complete setup guide, or [`api-agents` CLI docs](docs/api-agents.md) for webhook management.

## Conversation Outputs

After each conversation:

```
conversations/<conversation_id>/
├── transcript.json          # Full log with timing metrics
├── conversation_mix.wav     # Combined audio (user + assistant)
├── user_0.wav              # Individual user turns
├── assistant_0.wav         # Individual assistant turns
└── judge_evaluation.json   # CriteriaJudge results (if enabled)
```

Transcript includes TTFAB, latency stats, turn counts, and full message history.

## Custom Implementations

### Custom TTS Engine

```python
from layercode_gym.simulator import TTSEngineProtocol
from pathlib import Path

class MyTTSEngine(TTSEngineProtocol):
    async def synthesize(self, text: str, **kwargs) -> Path:
        # Your TTS service (ElevenLabs, Azure, etc.)
        return audio_file_path

simulator = UserSimulator.from_text(
    messages=["Hello!"],
    send_as_text=False,
    tts_engine=MyTTSEngine()
)
```

### Custom LLM for Agents

Use any LLM supported by PydanticAI. **Important:** You must define the system prompt with proper placeholders.

```python
from pydantic_ai import Agent
from textprompts import TextTemplates

# Load the required prompt template
templates = TextTemplates("src/layercode_gym/simulator/prompts")
system_prompt = templates.render(
    "basic_agent.txt",
    background_context="Your background",
    intent="Your intent"
)

# Create custom agent with proper system prompt
agent = Agent(
    "anthropic:claude-3-5-sonnet",
    system_prompt=system_prompt
)

simulator = UserSimulator.from_agent(agent=agent, deps=my_deps)
```

**Available models:**
- `openai:gpt-5` / `openai:gpt-5-mini`
- `anthropic:claude-3-5-sonnet`
- `ollama:llama3` (local)
- `gemini:gemini-1.5-pro`

**Prompt requirements:** The system prompt must include `{background_context}` and `{intent}` placeholders. See `src/layercode_gym/simulator/prompts/basic_agent.txt` for the default template.

### Custom Simulator

Full control via protocol implementation:

```python
from layercode_gym.simulator import UserSimulatorProtocol, UserRequest, UserResponse

class MyCustomSimulator(UserSimulatorProtocol):
    async def get_response(self, request: UserRequest) -> UserResponse | None:
        # Your logic here
        return UserResponse(text="Hello!", audio_path=None, data=())
```

## Environment Variables

**Required:**
```bash
SERVER_URL="http://localhost:8001"       # Your backend server
LAYERCODE_AGENT_ID="your_agent_id"       # LayerCode agent ID
```

**Optional:**
```bash
OPENAI_API_KEY="sk-..."                  # For TTS and AI agents
OPENAI_TTS_MODEL="gpt-4o-mini-tts"       # TTS model
OPENAI_TTS_VOICE="coral"                 # Voice (alloy, echo, fable, onyx, nova, shimmer, coral)
LAYERCODE_OUTPUT_ROOT="./conversations"  # Save location
LOGFIRE_TOKEN="..."                      # Enable LogFire observability
```

## LogFire Integration

Real-time observability and debugging with [LogFire](https://logfire.pydantic.dev/):

```bash
export LOGFIRE_TOKEN="your_token_here"
```

Automatically instruments PydanticAI and OpenAI calls, providing:
- Real-time conversation tracking
- Performance metrics and spans
- Error tracking with stack traces
- Beautiful UI for exploring conversations

## Type Safety

Enforces `mypy --strict` throughout. All event schemas use `TypedDict` or dataclasses.

```bash
uv run mypy src/layercode_gym
```

## Related Projects

- **[layercode-create-app](https://github.com/svilupp/layercode-create-app)** - CLI to scaffold LayerCode backends with tunneling
- **[layercode-examples](https://github.com/svilupp/layercode-examples)** - Agent patterns and integration recipes

## Documentation

Full documentation at [svilupp.github.io/layercode-gym](https://svilupp.github.io/layercode-gym)

- [Getting Started](https://svilupp.github.io/layercode-gym/getting-started)
- [Core Concepts](https://svilupp.github.io/layercode-gym/concepts)
- [Examples](https://svilupp.github.io/layercode-gym/examples)
- [API Reference](https://svilupp.github.io/layercode-gym/api-reference)
- [Advanced Usage](https://svilupp.github.io/layercode-gym/advanced)

## Contributing

This is a minimal, focused toolkit. Extensions should be done via:
- Custom simulator strategies (implement `UserSimulatorProtocol`)
- Custom callbacks (implement `TurnCallback` or `ConversationCallback`)
- Custom TTS engines (implement `TTSEngineProtocol`)

Keep the core simple and extensible.

## License

MIT - See [LICENSE](LICENSE) file for details.
