Metadata-Version: 2.4
Name: fluxloop-cli
Version: 0.4.0
Summary: FluxLoop CLI — Agent evaluation framework
Project-URL: Homepage, https://github.com/chuckgu/fluxloop
Project-URL: Documentation, https://docs.fluxloop.dev
Project-URL: Repository, https://github.com/chuckgu/fluxloop
Project-URL: Issues, https://github.com/chuckgu/fluxloop/issues
Author-email: FluxLoop Team <team@fluxloop.dev>
License: Apache-2.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.11
Requires-Dist: claude-agent-sdk>=0.1.0
Requires-Dist: httpx>=0.27
Requires-Dist: pydantic>=2.0
Requires-Dist: python-dotenv>=1.0
Requires-Dist: rich>=13.0
Requires-Dist: ruamel-yaml>=0.18
Requires-Dist: typer>=0.9
Provides-Extra: dev
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# FluxLoop CLI

Command-line interface for the FluxLoop agent evaluation framework.

## Installation

```bash
pip install fluxloop-cli
```

## Quick Start

```bash
# Authenticate
fluxloop auth login

# Create a project and scenario
fluxloop projects create --name my-project
fluxloop init scenario smoke-test
fluxloop scenarios create --name smoke-test --goal "Validate agent accuracy"

# Run a skill test
fluxloop skill validate ./SKILL.md
fluxloop skill test ./SKILL.md --input "Generate a summary" --copy-files src/

# Full test workflow (pull inputs → run → push results)
fluxloop test --scenario smoke-test
```

## Commands

### Core

| Command | Description |
|---------|-------------|
| `fluxloop run` | Run agent over configured inputs using the selected executor |
| `fluxloop skill validate` | Validate SKILL.md against static contracts |
| `fluxloop skill test` | Execute a skill in sandbox with behavior contract evaluation |
| `fluxloop skill benchmark` | Run N benchmark iterations and report stats |
| `fluxloop init scenario <name>` | Scaffold a new scenario directory |
| `fluxloop context show` | Display current project, scenario, and workspace state |
| `fluxloop test` | Full workflow: pull → run → push |
| `fluxloop test results` | View local or remote test results |
| `fluxloop evaluate` | Trigger server-side evaluation and wait for completion |

### Authentication & Projects

| Command | Description |
|---------|-------------|
| `fluxloop auth login` | Authenticate via device code flow |
| `fluxloop auth logout` | Remove stored credentials |
| `fluxloop auth status` | Show login state and token expiry |
| `fluxloop projects list` | List available projects |
| `fluxloop projects create` | Create a new project |
| `fluxloop projects select` | Set active project |
| `fluxloop apikeys create` | Generate an API key (saved to `.fluxloop/.env`) |
| `fluxloop apikeys list` | List existing API keys |

### Scenarios & Data Pipeline

| Command | Description |
|---------|-------------|
| `fluxloop scenarios create` | Create a scenario on the server |
| `fluxloop scenarios select` | Set active scenario locally |
| `fluxloop scenarios refine` | Refine scenario contracts |
| `fluxloop personas suggest` | Generate user personas via LLM |
| `fluxloop inputs synthesize` | Generate test inputs from personas |
| `fluxloop inputs list` | List available input sets |
| `fluxloop inputs qc` | Quality-check generated inputs |
| `fluxloop inputs refine` | Refine inputs iteratively |
| `fluxloop bundles publish` | Publish input sets as versioned bundles |
| `fluxloop bundles list` | List published bundles |
| `fluxloop manifests show` | Display current manifest |
| `fluxloop manifests publish` | Publish manifest to server |
| `fluxloop data push` | Upload knowledge or ground-truth data |
| `fluxloop data bind` | Bind uploaded data to a scenario |
| `fluxloop data gt status` | Check ground-truth materialization status |
| `fluxloop intent refine` | Refine agent profile and test intent |

### Sync

| Command | Description |
|---------|-------------|
| `fluxloop sync pull` | Pull bundle (inputs, personas, criteria) from server |
| `fluxloop sync push` | Upload test results to server |

## Configuration

Scenario configuration lives in YAML files under `scenarios/<name>/configs/`:

```
scenarios/
  smoke-test/
    configs/
      simulation.yaml    # Runner, iterations, conversation settings
      input.yaml         # Input source and items
      scenario.yaml      # Scenario metadata
    contracts/
      static.yaml        # SKILL.md structure rules
      behavior.yaml      # Execution assertions
    pulled/              # Data from sync pull
```

### Runner Types

Configure the executor in `simulation.yaml`:

**Function** — call a Python handler directly:

```yaml
runner:
  type: function
  target: "my_agent:handler"
  timeout_seconds: 30
```

**Skill** — run a SKILL.md in Claude Agent SDK sandbox:

```yaml
runner:
  type: skill
  skill_path: ./SKILL.md
  harness: claude
  allowed_tools: ["Read", "Write", "Shell"]
  skill_max_turns: 10
  budget: 0.50
```

**Process** — invoke a subprocess via NDJSON protocol:

```yaml
runner:
  type: process
  command: ["python", "agent.py"]
  protocol: ndjson
```

### Input Sources

```yaml
input:
  source: inline          # inline | generated | bundle | pulled
  items:
    - text: "Hello, summarize this document"
    - text: "What are the key takeaways?"
```

When `source: pulled`, inputs are loaded from `pulled/inputs.json` after `sync pull`.

### Environment Variable Substitution

YAML config values support `${VAR}` syntax, resolved from environment variables.

## Contracts

### Static Contract

Validates SKILL.md structure before execution:

- Required sections (e.g., `# Purpose`, `# Instructions`)
- File size limits
- Encoding checks
- Forbidden pattern detection

### Behavior Contract

Asserts conditions on execution results:

- `tool_called` / `tool_not_called`
- `turn_count` (min/max)
- `output_contains` / `output_matches`
- `file_exists`
- `cost_below` / `duration_below`

## Authentication

FluxLoop uses OAuth device code flow for interactive login:

```bash
fluxloop auth login              # Opens browser for approval
fluxloop auth login --no-browser # Manual code entry
fluxloop auth login --no-wait    # Save pending, resume later
fluxloop auth login --resume     # Resume pending login
```

Tokens are stored in `~/.fluxloop/auth.json`. For CI environments, use `FLUXLOOP_API_KEY` instead.

## Environment Variables

| Variable | Purpose |
|----------|---------|
| `FLUXLOOP_API_URL` | Backend API base URL |
| `FLUXLOOP_API_KEY` | API key for authenticated requests |
| `FLUXLOOP_SYNC_API_KEY` | API key specifically for sync operations |
| `ANTHROPIC_API_KEY` | Required for multi-turn UserSimulator |
| `OPENAI_API_KEY` | Alternative provider for UserSimulator |

Workspace-level variables can also be set in `.fluxloop/.env`.

## Output

Test runs produce standardized output in `.fluxloop/results/<experiment>-<timestamp>/`:

| File | Content |
|------|---------|
| `trace_summary.jsonl` | Per-run execution traces (tool calls, tokens, cost) |
| `summary.json` | Aggregated statistics (success rate, duration, cost) |
| `errors.json` | Failure inventory with diagnostics |

## Developing

```bash
cd cli/python

# Install dependencies
uv sync --group dev

# Run in development mode
uv run fluxloop --help

# Run tests
uv run pytest

# Lint
uv run ruff check .
```

## Building & Publishing

```bash
# Build
uv build

# Publish to PyPI
uv publish
# or
twine upload dist/*
```

## Tech Stack

| Library | Purpose |
|---------|---------|
| [Typer](https://typer.tiangolo.com) | CLI framework |
| [Pydantic](https://docs.pydantic.dev) | Data validation |
| [ruamel.yaml](https://yaml.readthedocs.io) | YAML parsing |
| [httpx](https://www.python-httpx.org) | HTTP client |
| [Rich](https://rich.readthedocs.io) | Terminal output formatting |
| [claude-agent-sdk](https://pypi.org/project/claude-agent-sdk/) | Skill execution in Claude sandbox |

## License

Apache-2.0
