Metadata-Version: 2.4
Name: codebase-mcp
Version: 0.1.2
Summary: Persistent, portable codebase intelligence MCP server with incremental indexing and decision memory
License: MIT
Keywords: agent,ai,codebase,indexer,mcp,tree-sitter
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.11
Requires-Dist: click>=8.0
Requires-Dist: mcp[cli]>=1.0.0
Requires-Dist: rich>=13.0
Requires-Dist: tree-sitter-language-pack>=0.1.0
Requires-Dist: tree-sitter>=0.23.0
Provides-Extra: all
Requires-Dist: pyyaml>=6.0; extra == 'all'
Requires-Dist: watchdog>=4.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Provides-Extra: langs
Requires-Dist: tree-sitter-go; extra == 'langs'
Requires-Dist: tree-sitter-javascript; extra == 'langs'
Requires-Dist: tree-sitter-python; extra == 'langs'
Requires-Dist: tree-sitter-rust; extra == 'langs'
Requires-Dist: tree-sitter-typescript; extra == 'langs'
Provides-Extra: watch
Requires-Dist: watchdog>=4.0; extra == 'watch'
Provides-Extra: yaml
Requires-Dist: pyyaml>=6.0; extra == 'yaml'
Description-Content-Type: text/markdown

# codebase-mcp

A local MCP server that gives any AI agent or IDE a persistent, structured understanding of your codebase. It indexes your project once, stores everything in a local SQLite database, and answers structural questions instantly without the agent having to read any files.

---

## The problem it solves

Every time you start a new session in Claude Code, Cursor, Cline, or any other AI tool, the agent has to re-read your files to understand the codebase. On large projects this burns thousands of tokens just on orientation, and the agent still only sees a shallow surface. It cannot answer questions like "what calls this function", "what changed since yesterday", or "what decisions were made and why" without reading everything again.

codebase-mcp solves this by:

- Parsing your code once with tree-sitter (Python, TypeScript, JavaScript, Go, Rust, and 50+ more languages)
- Storing every function, class, method, import, and call site in a local database
- Keeping that database up to date incrementally (only changed files are re-parsed)
- Exposing a set of MCP tools so any agent can query the structure without reading files
- Persisting architectural decisions, notes, and session history across every agent and IDE you use

---

## How it is different from standard approaches

| | Standard approach | codebase-mcp |
|---|---|---|
| Codebase understanding | Agent reads files in context window | Pre-indexed, queried via tool calls |
| Cost per session | Hundreds to thousands of tokens on orientation | Near zero — index is already built |
| Call graph | Not available | Full caller/callee resolution with 3 strategies |
| Decisions and notes | Lost when context resets | Stored in database, searchable forever |
| Switching agents/IDEs | Start over from scratch | Export once, import anywhere |
| Multi-language | Depends on the agent | 50+ languages via tree-sitter |
| Incremental updates | Full re-read every time | SHA256-based, only changed files reparsed |

The core idea is that the agent should never read source files to understand structure. It should call tools and get structured answers back. Reading files is for when you actually need to see the code, not for orientation.

---

## Installation

Requires Python 3.11 or later.

```bash
pip install git+https://github.com/vatsal2025/CodeBase.git
```

Or clone and install in editable mode for development:

```bash
git clone https://github.com/vatsal2025/CodeBase.git
cd CodeBase
pip install -e .
```

---

## Registering with your IDE or agent

Run this once after installation. It writes the MCP server configuration into the config files for every supported tool automatically.

```bash
codebase-mcp setup
```

To target a specific tool:

```bash
codebase-mcp setup --ide claude-code
codebase-mcp setup --ide cursor
codebase-mcp setup --ide windsurf
codebase-mcp setup --ide cline
codebase-mcp setup --ide zed
```

For Claude Code global registration (available across all projects):

```bash
codebase-mcp setup --ide claude-code --global
```

After setup, restart your IDE or agent. The MCP server named `codebase-intel` will appear in the tool list.

---

## Indexing your project

Before the agent can use the tools, you need to build the index. You can do this from the terminal or let the agent do it on first run.

From the terminal:

```bash
cd /path/to/your/project
codebase-mcp index .
```

Force a full re-index (ignores hash cache):

```bash
codebase-mcp index . --full
```

The index is stored at `.codebase-mcp/index.db` inside your project directory. On typical projects it builds in under 30 seconds.

---

## Tools reference

These are the tools the agent has access to. A well-configured agent should call these instead of reading files.

### Session start

**session_bootstrap(project_root)**
Call this at the start of every session. Returns project stats, most-referenced files, active decisions, recent notes, and whether the index needs updating. One call gives the agent a complete orientation with minimal tokens.

**what_changed(project_root)**
Returns what files changed since the last index run, with a diff of added and removed symbols. Use this when returning to a project after a gap.

**index_project(project_root, full_reindex)**
Builds or updates the index. Only changed files are re-parsed. Call this when session_bootstrap reports stale files.

**get_index_status(project_root)**
Check staleness without triggering a re-index.

### Structural queries

**search_symbols(query, kind, language, limit)**
Full-text search across all symbols. Finds functions, classes, methods, structs, interfaces, and traits by name or docstring content. Works across all languages.

```
search_symbols("authenticate")
search_symbols("User", kind="class")
search_symbols("validate", language="typescript")
```

**get_symbol(qualified_name)**
Get complete details for one symbol: its signature, docstring, file location, and its full list of callers and callees.

```
get_symbol("src.auth.jwt.verify_token")
```

**get_file_outline(path)**
Get the complete structure of a file — all symbols organized hierarchically (methods grouped under their class), with signatures and line ranges. Does not require reading the file.

**get_file_context(path)**
Everything about a file in one call: its outline, who imports it, decisions linked to it, and notes attached to it.

**get_call_graph(qualified_name, depth)**
Trace callers and callees recursively. Shows exactly what calls a function and what that function calls, across files and languages.

**find_references(name)**
Find every place a symbol is used across the codebase.

**search_code(pattern, file_pattern, language)**
Grep-style search across source files. Returns matching lines with context. Use this when you need to see actual code, not just structure.

**find_todos()**
Returns all TODO, FIXME, HACK, BUG, and NOTE comments in the entire codebase in a single call.

**query_symbols_sql(sql)**
Run a raw SQL query against the symbol database for advanced filtering. Use this for anything the other search tools cannot express.

### Knowledge persistence

**add_decision(title, body, category, session_id)**
Record an architectural decision. Categories: architecture, security, performance, api, database, general. These persist across every session, agent, and IDE.

```
add_decision(
  title="Use JWT for stateless auth",
  body="Chosen over sessions to support horizontal scaling. HS256 with 1h expiry.",
  category="security"
)
```

**search_decisions(query, category, status)**
Search recorded decisions by keyword, category, or status (active/superseded/deprecated). Always check this at session start to recover context from previous sessions.

**update_decision(decision_id, status, body)**
Mark a decision as superseded or deprecated when the approach changes.

**add_note(body, scope, scope_ref)**
Attach a note to the whole project, a specific file, or a specific symbol. Notes persist and are returned by get_file_context.

```
add_note("Token refresh logic is intentionally synchronous — see issue #42", scope="file", scope_ref="src/auth/jwt.py")
```

**get_notes(scope, scope_ref)**
Retrieve notes for the project, a file, or a symbol.

### Knowledge transfer

**export_context(project_root, output)**
Export decisions, notes, and optionally the full symbol index to a JSON file. Use this before switching agents or onboarding a new team member.

**import_context(import_file, project_root)**
Import an exported context file. Merges decisions and notes into the local database.

**create_handoff(project_root, output)**
Create a complete handoff package: context export plus a human-readable summary of the project state, top files, and active decisions. Use this when switching from one agent to another.

**index_github_repo(url)**
Clone a GitHub repository, index it, and return a bootstrap summary. Use this to explore any open source project without manually cloning.

---

## How to use it to full potential

### At the start of every session

The agent should always call `session_bootstrap` first, not read any files. If the index is stale, it should call `index_project` immediately after. Then it should call `search_decisions` to recover context from previous sessions.

A good agent prompt to enforce this:

```
Before doing anything else in this project:
1. Call session_bootstrap to orient yourself
2. If index_stale is true, call index_project
3. Call search_decisions to review past decisions
4. Never read a source file to understand structure — use search_symbols, get_file_outline, or get_call_graph instead
```

### Recording decisions as you work

Every significant decision made during a session should be recorded immediately with `add_decision`. This is the most important habit. When you or a future agent returns to the project, `search_decisions` recovers this context in one call instead of re-deriving it from reading code.

What is worth recording:
- Why a library or framework was chosen
- Why a design pattern was picked over alternatives
- Security constraints or compliance requirements
- Non-obvious performance decisions
- Anything that would take more than 5 minutes to figure out from reading the code

### Using the call graph

Before modifying a function, call `get_symbol` with its qualified name to see its callers. This tells you the blast radius of any change without reading files. `get_call_graph` with depth > 1 traces multi-level dependencies.

### Watching for changes

If you run the watcher, the index stays current automatically:

```bash
codebase-mcp serve --watch
```

From within a session, you can also start it via the tool:

```
start_file_watcher(project_root="...")
```

---

## Transferring knowledge when switching platforms

The index and all knowledge (decisions, notes, session history) live in `.codebase-mcp/index.db` inside your project directory. There are three ways to transfer this to another platform or agent.

### Option 1: Commit the database to git

Add the `.codebase-mcp/` directory to git instead of ignoring it. Anyone who clones the repository gets the full index, all decisions, and all notes immediately. No re-indexing required.

Remove the exclusion from your `.gitignore`:

```
# Remove or comment out this line:
# .codebase-mcp/
```

Then commit:

```bash
git add .codebase-mcp/index.db
git commit -m "Add codebase index and decision log"
```

This is the recommended approach for teams. New developers get the full context on clone.

### Option 2: Export and import

Export from the source machine:

```bash
codebase-mcp export . --output context.json
```

Import on the destination:

```bash
codebase-mcp import context.json /path/to/project
```

The export includes decisions, notes, and optionally the full symbol index. You can share it as a file attachment, a gist, or through any file transfer mechanism.

The agent can also do this directly:

```
export_context(project_root="/path/to/project", output="context.json")
import_context(import_file="context.json", project_root="/path/to/project")
```

### Option 3: Create a handoff package

When switching from one agent or IDE to another mid-session:

```bash
codebase-mcp handoff . --output handoff/
```

Or via tool:

```
create_handoff(project_root="...")
```

The handoff includes the export JSON plus a written summary of current project state, active decisions, and recent changes. Give this to the new agent at session start.

### What transfers and what does not

| Data | Transfers | Notes |
|---|---|---|
| Decisions | Yes | All statuses |
| Notes | Yes | All scopes |
| Symbol index | Optional | Rebuilt automatically by index_project |
| Call graph | Rebuilt from index | Run index_project after import |
| Session history | No | Sessions are local only |

---

## Configuration

The config file lives at `.codebase-mcp/config.json` inside your project. It is created automatically on first index with sensible defaults.

Key settings:

```json
{
  "project_root": "/path/to/project",
  "exclude_patterns": [
    "**/.git/**",
    "**/node_modules/**",
    "**/__pycache__/**",
    "**/dist/**",
    "**/build/**",
    "**/*.min.js",
    "**/*.map"
  ],
  "max_file_size_kb": 500,
  "include_extensions": []
}
```

`exclude_patterns` accepts standard glob patterns. `include_extensions` restricts indexing to specific file types if set.

---

## Supported languages

Full tree-sitter parsing (functions, classes, methods, imports, call graph):

- Python
- TypeScript and JavaScript (including JSX/TSX)
- Go
- Rust

Universal parser (symbols and structure, no call graph):

- Java, Kotlin, Swift, C, C++, C#, Ruby, PHP, Scala, Dart, Lua, Bash, SQL, HTML, CSS, YAML, TOML, JSON, Dockerfile, Makefile, and 30+ more via tree-sitter-language-pack

---
