Metadata-Version: 2.4
Name: agentdx
Version: 0.1.0a2
Summary: Open-source diagnostic SDK for detecting failure pathologies in AI agent systems
Author: Anmol Parimoo
License-Expression: MIT
Project-URL: Homepage, https://github.com/mldeepsystems/agentdx
Project-URL: Repository, https://github.com/mldeepsystems/agentdx
Project-URL: Issues, https://github.com/mldeepsystems/agentdx/issues
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Testing
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Dynamic: license-file

# agentdx

**Open-source diagnostic SDK for detecting failure pathologies in AI agent systems.**

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![PyPI](https://img.shields.io/pypi/v/agentdx.svg)](https://pypi.org/project/agentdx/)

---

agentdx detects operational failure modes in AI agent systems — the kind of failures that observability tools miss because they happen at the reasoning level, not the infrastructure level. It analyses agent execution traces and produces structured diagnostic reports identifying specific pathologies.

## Why agentdx?

Existing agent observability tools (LangSmith, Arize, Datadog LLM) excel at **tracing**: they show you what happened. agentdx does **diagnosis**: it tells you what went wrong and why.

| Tool Type | What it answers | Examples |
|-----------|----------------|---------|
| Tracing / Observability | "What calls did the agent make?" | LangSmith, Arize, Datadog |
| **Diagnostics** | **"Why did the agent fail?"** | **agentdx** |

## The Seven Pathologies

agentdx detects seven operational failure modes, designed to align with the [OWASP Top 10 for Agentic Applications](https://genai.owasp.org/) and [UC Berkeley's MAST framework](https://arxiv.org/abs/2503.13657):

| Pathology | What it means |
|-----------|--------------|
| **Context Erosion** | Agent loses critical context over long conversations or multi-step tasks |
| **Tool Thrashing** | Agent repeatedly calls tools with ineffective or contradictory parameters |
| **Instruction Drift** | Agent gradually deviates from its original instructions or mandate |
| **Recovery Blindness** | Agent fails to detect or recover from errors in its own execution |
| **Hallucinated Tool Success** | Agent treats a failed tool call as successful and proceeds on false premises |
| **Goal Hijacking** | Agent's objective is altered by adversarial input or environmental manipulation |
| **Silent Degradation** | Agent's output quality deteriorates without triggering explicit errors |

## Quick Start

```bash
pip install agentdx
```

```python
from agentdx import Diagnoser, JSONParser

# Parse an agent execution trace (file path, dict, or message list)
parser = JSONParser()
trace = parser.parse("path/to/trace.json")

# Run all detectors
diagnoser = Diagnoser()
report = diagnoser.diagnose(trace)

# View results
print(report.summary())
report.to_json("diagnostic_report.json")
report.to_markdown("diagnostic_report.md")
```

The trace file is a JSON object with a `messages` array. Each message has a `role`, `content`, and optional `tool_calls`:

```json
{
  "trace_id": "my-trace",
  "messages": [
    {"role": "user", "content": "Find the best Python testing framework."},
    {
      "role": "assistant",
      "content": "Let me search for that.",
      "tool_calls": [
        {
          "tool_name": "web_search",
          "arguments": {"query": "best Python testing framework"},
          "result": "No relevant results found.",
          "success": true
        }
      ]
    }
  ]
}
```

## Architecture

agentdx uses a three-tier detection architecture:

```
Trace Input → Parser → Normalised Trace → Detectors → Diagnostic Report
                                              │
                                   ┌──────────┼──────────┐
                                   │          │          │
                              Rule-Based   ML Model   LLM-as-Judge
                              (< 10ms)    (< 100ms)   (async)
```

**Tier 1 — Rule-Based** (v0.1.0): Deterministic pattern matching on trace structure. Fast, interpretable, zero external dependencies.

**Tier 2 — ML Classifier** (planned): Trained classifiers for pathologies that require statistical pattern recognition.

**Tier 3 — LLM-as-Judge** (planned): Asynchronous LLM evaluation for complex, context-dependent pathology detection.

## Supported Frameworks

| Framework | Parser | Status |
|-----------|--------|--------|
| Raw JSON traces | `JSONParser` | v0.1.0 |
| LangChain / LangGraph | `LangChainParser` | Planned |
| CrewAI | `CrewAIParser` | Planned |
| AutoGen / AG2 | `AutoGenParser` | Planned |
| OpenAI Agents SDK | `OpenAIAgentParser` | Planned |

## Output Formats

- **JSON** — machine-readable diagnostic report
- **Markdown** — human-readable summary with severity ratings

## Research

agentdx is developed by [MLDeep Systems](https://mldeep.io), an AI and data consulting firm specialising in agent reliability. See [mldeep.io/research](https://mldeep.io/research) for related publications.

The failure pathology taxonomy draws on:

- OWASP GenAI Security Project. (2025). *Top 10 for Agentic Applications.* [genai.owasp.org](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/)
- Cemri, M., Pan, M., et al. (2025). *Why Do Multi-Agent LLM Systems Fail?* arXiv:2503.13657. [arxiv.org](https://arxiv.org/abs/2503.13657)

## Contributing

We welcome contributions. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## Citation

If you use agentdx in your research, please cite:

```bibtex
@software{mldeepsystemsagentdx,
  author    = {Parimoo, Anmol},
  title     = {agentdx: An Open-Source Diagnostic SDK for AI Agent Reliability},
  year      = {2026},
  url       = {https://github.com/mldeepsystems/agentdx},
  license   = {MIT}
}
```

## License

MIT. See [LICENSE](LICENSE) for details.
