Metadata-Version: 2.4
Name: vibeval
Version: 0.7.3
Summary: vibeval (Vibe Coding Eval) — AI application testing framework
Project-URL: Homepage, https://github.com/SandyKidYao/vibeval
Project-URL: Repository, https://github.com/SandyKidYao/vibeval
Project-URL: Issues, https://github.com/SandyKidYao/vibeval/issues
Author-email: SandyKid <yupengyao912@gmail.com>
License-Expression: MIT
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == 'dev'
Description-Content-Type: text/markdown

# vibeval — Vibe Coding Eval

A fast evaluation framework for AI applications. Install Claude Code and the vibeval CLI to get an end-to-end workflow from code analysis to test generation to evaluation.

## What Problem Does It Solve

Traditional software testing frameworks cannot assess the quality of AI outputs; traditional AI evaluation platforms rely on dataset construction and cannot keep up with the pace of feature iteration. vibeval strikes a balance between the two:

- Analyze your code via VibeCoding to quickly generate synthetic data and test cases
- Deterministic rules + LLM semantic judgment for dual-layer evaluation
- Cross-version comparison to track quality changes over time
- Language-agnostic: generated test code adapts to your project's framework without depending on the vibeval package
- Per-tool validation for Agent projects (custom tools, MCP tools, sub-agents) with a 5-dimension coverage matrix enforced by the Evaluator

## Prerequisites

- [Claude Code](https://claude.ai/code)
- Python 3.10+

## Installation

```bash
# Install the vibeval CLI
pip install vibeval

# Install the Claude Code plugin (run this inside Claude Code)
/install-plugin https://github.com/SandyKidYao/vibeval
```

## Usage

Before first use, verify that the LLM provider is set up correctly:

```bash
vibeval check
```

Then run the unified workflow inside Claude Code:

```
/vibeval meeting_summary
```

The `/vibeval` command detects your project state and guides you through the appropriate phase:

- **New project** — Scans for AI code, suggests features to test, runs the full pipeline
- **In progress** — Verifies existing artifacts, continues from where you left off
- **Complete** — Detects code changes for incremental updates, or lets you re-run, add tests, or modify designs

Each phase (analyze → design → code → synthesize → run) pauses for your review before continuing. Every step produces editable intermediate files.

### Cross-Version Comparison

```bash
# Statistical comparison
vibeval diff meeting_summary run_a run_b

# LLM deep comparison
vibeval compare meeting_summary run_a run_b
```

### Interactive Dashboard

```bash
vibeval serve --open
```

Launch a web dashboard to browse all features, view test results and traces, visualize trends across runs, and manage datasets and judge specs.

### Data Validation

```bash
# Validate datasets and results against protocol format
vibeval validate meeting_summary
```

Checks manifest structure, judge specs, data item fields, `_mock_context`, trace format, and cross-references before you run the judge.

### Other Commands

```bash
# Show evaluation summary
vibeval summary meeting_summary latest

# List features and runs
vibeval features
vibeval runs meeting_summary

# See all commands
vibeval --help
```

## License

MIT
