Metadata-Version: 2.4
Name: python-harness
Version: 0.0.11
Summary: An agentic codebase evaluation and evolution tool for Python projects.
Author-email: Mingli Yuan <mingli.yuan@gmail.com>
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer>=0.9.0
Requires-Dist: rich>=13.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: openai>=1.0.0
Requires-Dist: anthropic>=0.18.0
Requires-Dist: tenacity>=8.2.0
Requires-Dist: tiktoken>=0.6.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pytest>=8.0.0
Requires-Dist: pytest-cov>=4.1.0
Requires-Dist: ruff>=0.3.0
Requires-Dist: mypy>=1.9.0
Requires-Dist: radon>=6.0.1
Provides-Extra: dev
Requires-Dist: ty>=0.0.1; extra == "dev"
Dynamic: license-file

# Python Harness

An agentic codebase evaluation and evolution tool for Python projects.

`python-harness` is designed to be a universal standard tool—just like `pytest` or `ruff`—but instead of just checking syntax or running tests, it evaluates the **architecture, readability, and governance** of your codebase using both static analysis and LLMs (DeepSeek/OpenAI).

## Features

1. **Hard Evaluation (First Fence)**: Enforces strict rules using `ruff`, `mypy`, and `ty`. Evaluates Cyclomatic Complexity (CC) and Maintainability Index (MI) via `radon`.
2. **Governance QC (Second Fence)**: Checks if the changes violate core project governance or attempt to bypass the evaluation rules themselves.
3. **Soft Evaluation (Third Fence)**:
   - Calculates architecture metrics like Fan-out (coupling).
   - Generates a holistic package understanding using LLMs.
   - Performs "Blind QA": Randomly samples functions/classes and tests the LLM's ability to understand them without context.
4. **Actionable Output**: Synthesizes the evaluation into a final `Pass/Fail` verdict with exactly 3 concrete, actionable refactoring suggestions.

## Installation

You can install `python-harness` using `uv` or `pip`:

```bash
uv pip install python-harness
```

## Configuration

`python-harness` requires an LLM to perform its soft evaluation. Create a `.env` file in the root of your project:

```env
LLM_API_KEY=your_api_key_here
LLM_BASE_URL=https://api.deepseek.com/v1
LLM_MODEL_NAME=deepseek-reasoner
LLM_MINI_MODEL_NAME=deepseek-chat
```

*(Note: If you don't provide an API key, the harness will safely run in Mock mode).*

## Usage

### 1. Measure

To evaluate your codebase, run the `measure` command in your project directory:

```bash
harness measure .
```

This will run the full 3-fence evaluation and output a report with a final verdict and top 3 improvement suggestions.

### 2. Refine (Evolution Loop - WIP)

The `refine` command is an Agentic Edit-Test-Improve loop. It takes the suggestions generated by `measure`, automatically creates branches (variants), applies the changes, runs the tests (`pytest`), and picks the best variant.

```bash
harness refine . --steps 1 --max-retries 3
```

## License

MIT License. See [LICENSE](LICENSE) for more details.

A harness toolkit for Python projects
