Metadata-Version: 2.4
Name: contextinator
Version: 1.2.3
Summary: Intelligent Codebase Understanding for AI Agents - Transform any codebase into semantically-aware, searchable knowledge
Author-email: STARTHACK Team <founders@starthack.io>
License: Apache-2.0
Project-URL: Homepage, https://github.com/starthackHQ/Contextinator
Project-URL: Documentation, https://github.com/starthackHQ/Contextinator/docs
Project-URL: Repository, https://github.com/starthackHQ/Contextinator
Project-URL: Issues, https://github.com/starthackHQ/Contextinator/issues
Keywords: ai,code-analysis,ast,embeddings,vector-search,semantic-search,codebase,tree-sitter,chromadb
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: tree-sitter>=0.25.0
Requires-Dist: chromadb>=1.3.0
Requires-Dist: openai>=2.6.0
Requires-Dist: tiktoken>=0.12.0
Requires-Dist: python-dotenv>=1.2.0
Requires-Dist: flask>=3.1.0
Requires-Dist: requests>=2.32.0
Requires-Dist: rich>=13.0.0
Requires-Dist: toon-format>=0.9.0b1
Requires-Dist: nbformat>=5.0.0
Requires-Dist: tree-sitter-python>=0.25.0
Requires-Dist: tree-sitter-javascript>=0.25.0
Requires-Dist: tree-sitter-typescript>=0.23.0
Requires-Dist: tree-sitter-java>=0.23.0
Requires-Dist: tree-sitter-go>=0.25.0
Requires-Dist: tree-sitter-rust>=0.24.0
Requires-Dist: tree-sitter-cpp>=0.23.0
Requires-Dist: tree-sitter-c>=0.24.0
Requires-Dist: tree-sitter-c-sharp>=0.23.0
Requires-Dist: tree-sitter-php>=0.24.0
Requires-Dist: tree-sitter-bash>=0.25.0
Requires-Dist: tree-sitter-sql>=0.3.0
Requires-Dist: tree-sitter-kotlin>=1.1.0
Requires-Dist: tree-sitter-yaml>=0.7.0
Requires-Dist: tree-sitter-markdown>=0.5.0
Requires-Dist: tree-sitter-json>=0.24.0
Requires-Dist: tree-sitter-toml>=0.7.0
Requires-Dist: tree-sitter-swift>=0.0.1
Requires-Dist: tree-sitter-solidity>=1.2.0
Requires-Dist: tree-sitter-lua>=0.2.0
Requires-Dist: tree-sitter-html>=0.23.0
Requires-Dist: tree-sitter-css>=0.25.0
Requires-Dist: tree-sitter-zig>=1.1.0
Requires-Dist: tree-sitter-ruby>=0.23.0
Requires-Dist: tree-sitter-scala>=0.24.0
Requires-Dist: tree-sitter-haskell>=0.23.0
Requires-Dist: tree-sitter-ocaml>=0.24.0
Requires-Dist: tree-sitter-elixir>=0.3.0
Requires-Dist: tree-sitter-hcl>=1.2.0
Requires-Dist: tree-sitter-make>=1.1.0
Requires-Dist: tree-sitter-xml>=0.7.0
Requires-Dist: ruff>=0.14.10
Provides-Extra: extended-parsers
Requires-Dist: tree-sitter-dockerfile>=0.0.0a1; platform_system != "Windows" and extra == "extended-parsers"
Requires-Dist: tree-sitter-embedded-template>=0.25.0; extra == "extended-parsers"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: myst-parser>=2.0.0; extra == "docs"
Dynamic: license-file

<img src="https://raw.githubusercontent.com/starthackHQ/Contextinator/main/docs/banner.webp" alt="Contextinator" width="100%" />
<br />
<p align="center">
Turn any codebase into semantically-aware, searchable knowledge for AI-powered workflows.
</p>

### Key Features

- **AST-Powered Chunking** - Extract functions, classes, and methods from 23+ programming languages
- **Parent-Child Relationships** - Maintain hierarchical chunk-context for complete understanding
- **Semantic Search** - Find relevant code using natural language queries
- **Multiple Search Modes** - Semantic, symbol-based, pattern matching, and hybrid search
- **Smart Deduplication** - Hash-based detection of duplicate code
- **TOON Format Export** - Token-efficient output format for LLM prompts (40-60% token savings)
- **Full Pipeline Automation** - One command to chunk, embed, and store
- **Docker-Ready** - ChromaDB server included

### Use Cases

| **Agentic AI Systems**                              | **RAG Applications**                                     | **Code Intelligence**                      |
| --------------------------------------------------- | -------------------------------------------------------- | ------------------------------------------ |
| Dynamic code retrieval for autonomous coding agents | High-precision code retrieval for question answering     | Cross-repository code search and discovery |
| Context provision for code generation               | Context injection for code explanation and documentation | Duplicate and similar code detection       |
| Multi-step reasoning over large codebases           | Semantic code search across repositories                 | Legacy codebase analysis and understanding |
| Tool integration for agent frameworks               | Parent-child relationship tracking for complete context  | MCP-compliant async architecture           |

---

## Getting Started

### Prerequisites

- Python 3.11 or higher
- Docker (for ChromaDB)
- OpenAI API key (for embeddings)

### Installation

```bash
pip install contextinator
```

Verify the installation _(requiers chromadb & openai api key setup)_:

```bash
contextinator --help
```

For detailed setup and configuration, see [`USAGE.md`](https://github.com/starthackHQ/Contextinator/blob/main/USAGE.md)

### Getting Started

1. Index a repository:

```bash
contextinator chunk-embed-store-embeddings \
  --repo-url https://github.com/user/repo \
  --save \
  --collection-name MyRepo
```

2. Search your codebase:

```bash
# Natural language semantic search
contextinator search "authentication logic" -c MyRepo

# Find specific functions
contextinator symbol authenticate_user -c MyRepo

# Export results in TOON format for LLM consumption
contextinator search "error handling" -c MyRepo --toon results.json
```

For comprehensive CLI and Python API documentation, see [`USAGE.md`](https://github.com/starthackHQ/Contextinator/blob/main/USAGE.md)

## Acknowledgements

Built with and inspired by amazing open-source projects:

### Core Technologies

- **[tree-sitter](https://github.com/tree-sitter/tree-sitter)** - Incremental parsing system for AST generation
- **[ChromaDB](https://github.com/chroma-core/chroma)** - AI-native embedding database
- **[OpenAI](https://openai.com)** - Embedding generation API

### Inspired By

- **[Serena](https://github.com/oraios/serena)** - Code intelligence and semantic search
- **[Continue](https://github.com/continuedev/continue)** - AI-powered code assistant
- **[Tabby](https://github.com/TabbyML/tabby)** - Self-hosted AI coding assistant
- **[Semantic Code Search](https://github.com/sturdy-dev/semantic-code-search)** - Code search and retrieval
- **[Aider](https://github.com/Aider-AI/aider)** - AI pair programming in the terminal
- **[VS Code Copilot Chat](https://github.com/microsoft/vscode-copilot-chat)** - Conversational AI for code

## License

Licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/starthackHQ/Contextinator/blob/main/LICENSE) for details.

<h1 align="left">TL;DR <img src="https://raw.githubusercontent.com/starthackHQ/Contextinator/main/docs/0banner.png" alt="Contextinator" width="30" /></h1>

Contextinator is a code intelligence tool that uses Abstract Syntax Tree (AST) parsing to extract semantic code chunks, generates embeddings, and stores them in a vector database. This enables AI systems to understand, navigate, and reason about codebases with precision.

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=starthackHQ/Contextinator&type=Date)](https://star-history.com/#starthackHQ/Contextinator&Date)
