Metadata-Version: 2.4
Name: onecode-cli
Version: 0.1.6
Summary: Multi-agent codebase evaluation and reliability optimization.
Author-email: Shoaib Rahman <shoaibeee@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/shoaibur/OneCode
Project-URL: Repository, https://github.com/shoaibur/OneCode.git
Project-URL: Issues, https://github.com/shoaibur/OneCode/issues
Keywords: codebase,evaluation,llm,agents,reliability
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: openai>=1.0.0
Requires-Dist: anthropic>=0.7.0
Requires-Dist: faiss-cpu>=1.7.0
Requires-Dist: ragas>=0.1.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: fastapi>=0.104.0; extra == "dev"
Requires-Dist: uvicorn>=0.24.0; extra == "dev"

# OneCode - Agentic Codebase Evaluation

[![Release v0.1.5](https://img.shields.io/badge/release-v0.1.5-brightgreen.svg)](https://github.com/shoaibur/OneCode/releases)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![PyPI: onecode-cli](https://img.shields.io/badge/PyPI-onecode--cli-informational.svg)](https://pypi.org/project/onecode-cli/)
[![Platform: All](https://img.shields.io/badge/platform-linux%20%7C%20macos%20%7C%20windows-lightgrey.svg)](#)

Quantify agent and GenAI reliability • Analyze, search, and refactor codebases • Run and debug code using natural language • Intelligent code retrieval via semantic knowledge graphs • Track agent improvements over time.

---

## Quick Navigation

- [Evaluation Metrics](#evaluation-metrics)
- [Example: Evaluation Output](#example-evaluation-output)
- [Installation](#installation)
- [Setup](#setup)
- [How to run](#how-to-run)
- [Example queries](#example-queries)
- [Development](#development)
- [License](#license)

---

## Evaluation Metrics

OneCode evaluates agents using industry-standard metrics:

**Core GenAI Reliability**
- Faithfulness - How faithful is the output to the context
- Hallucination - How much the output diverges from the context; lower is better
- Answer Accuracy - How accurate the answer is compared to ground truth

**Agent-Specific**
- Agent Goal Accuracy - Did the agent achieve its intended objective?
- Tool Call F1 - Precision and recall of tool invocations

**Quality & Coherence**
- Answer Relevancy - How relevant the output is to the input question
- Response Groundedness - How grounded the response is in retrieved context

**Retrieval Quality**
- Context Precision - Ratio of relevant to total retrieved context chunks
- Context Recall - Ratio of retrieved to total relevant context chunks
- Context Relevance - How relevant the retrieved context is to the question

### Context-Aware Datasets

OneCode automatically generates test datasets tailored to each module by analyzing its purpose and code. These datasets are:

- Automatically refreshed when module code changes
- Reused consistently across evaluation runs for reliable trend tracking

---

## Example: Evaluation Output

```
You: evaluate the summarizer agent

Here is the complete evaluation report for the Summarizer Agent
(agents/summarizer.py):

Metric Scores (5 samples)
- Hallucination: 0.90 ✓ Good (lower is better)
- Answer Accuracy: 0.45 ⚠ Needs Improvement
- Context Precision: 0.27 ✗ Critical
- Answer Relevancy: 0.39 ✗ Critical
- Faithfulness: 0.10 ✗ Critical

Root Cause Analysis:
Faithfulness (0.10) & Response Groundedness (0.10) — The agent is 
largely fabricating content rather than grounding summaries in the 
provided input. This is a fundamental failure for a summarizer.

Comparison with Prior Run (3 days ago):
- Faithfulness: 0.10 (no change)
- Answer Accuracy: 0.45 (↑ +0.05 improvement)
- Context Precision: 0.27 (↓ -0.12 regression)

Recommendations:
1. Add input validation to reject malformed text
2. Implement a grounding constraint that requires citations
3. Test with diverse document types
```

### Accountability & Comparative Analysis

Track agent improvements over time and compare across versions:

```
You: how does this agent compare to last week's version?
→ Shows metrics side-by-side with delta (+/- changes)

You: which agents regressed in the last evaluation?
→ Flags agents with metric drops and explains why

You: show me the evaluation history for the coder agent
→ Displays trend chart showing faithfulness, accuracy over time
```

---

## Installation

**From PyPI:**
```bash
pip install onecode-cli
```

---

## Setup

**1. Configure environment**

Provide API keys using one of two methods:

**Method A: Create .env file**
Add API keys to .env in your project or home directory. OPENAI_API_KEY is always required (used for embeddings):
```
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...   # only needed for Claude models
```

**Method B: Export environment variables**
```bash
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...   # only needed for Claude models
```

---

## How to run

After installation, the onecode command is available globally:

```bash
# Default model (claude-sonnet-4-6) with explicit path
onecode ~/path/to/project

# Use current directory (default if no path specified)
onecode

# Specify a different model
onecode --model gpt-4o

# From within the codebase directory (same as above)
cd ~/myproject
onecode
```

**First installation check:**
```bash
onecode --help
```

**First run** — output looks like this:
```
$ onecode ~/myproject
OneCode - Codebase Analyzer
----------------------------------------
Model:    claude-sonnet-4-6
Indexing: /Users/you/myproject
Ready:    42 nodes (class:12, file:18, function:12) | 42 embeddings

Type a question or task (or 'exit' to quit).
----------------------------------------

You: 
```

**Subsequent runs** — output looks like this:
```
$ onecode ~/myproject
OneCode - Codebase Analyzer
----------------------------------------
Model:    claude-sonnet-4-6
Indexing: /Users/you/myproject
Ready:    42 nodes (class:12, file:18, function:12) | 42 embeddings

Type a question or task (or 'exit' to quit).
----------------------------------------

You: 
```

---

## Example queries

### Evaluate code quality with RAGAS metrics
```
You: evaluate the codebase
You: what is the accuracy of the coder agent?
You: compare this run with the previous evaluation
```

### Understand the codebase
```
You: what does this codebase do?
You: explain the authentication flow
You: what agents/modules are in this project?
```

### Find specific code
```
You: search for all calls to connect_db
You: where is the retry logic implemented?
You: find all async functions
```

### Write and modify code
```
You: add input validation to the login function
You: write a utility function that validates emails
You: refactor the parse_config function to handle missing keys gracefully
```

### Write, run, and debug
```
You: create a function that reverses a string, write a test for it, and run the test
You: add a health check endpoint and run the server to verify it starts
You: debug why the executor agent is failing on error handling
```

### File management
```
You: rename src/helpers.py to src/utils.py
You: delete the tmp/ directory
You: move all test files into a tests/ directory
```

### Git operations
```
You: show git status
You: show the diff of uncommitted changes
You: commit all staged files with message "add retry logic"
```

