Metadata-Version: 2.4
Name: mcp-server-thoth
Version: 0.4.0
Summary: MCP server for persistent codebase memory with semantic search and development tracking
Project-URL: Homepage, https://github.com/braininahat/thoth
Project-URL: Bug Tracker, https://github.com/braininahat/thoth/issues
Project-URL: Source Code, https://github.com/braininahat/thoth
Author-email: Varun Shijo <varunshi@buffalo.edu>
Maintainer-email: Varun Shijo <varun.shijo@gmail.com>
License: MIT
License-File: LICENSE
Keywords: analysis,codebase,mcp,memory,semantic-search,visualization
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: <3.13,>=3.10
Requires-Dist: aiosqlite>=0.20.0
Requires-Dist: chromadb>=0.4.0
Requires-Dist: click>=8.1.0
Requires-Dist: fastapi>=0.100.0
Requires-Dist: httpx>=0.24.0
Requires-Dist: mcp>=1.1.0
Requires-Dist: networkx>=3.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pytz>=2025.2
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0.0
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: sqlalchemy>=2.0.0
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.30.0
Requires-Dist: tzlocal>=5.3.1
Requires-Dist: uvicorn>=0.30.0
Requires-Dist: vllm>=0.8.5
Provides-Extra: cache
Requires-Dist: redis>=5.0.0; extra == 'cache'
Provides-Extra: dashboard
Requires-Dist: gradio>=5.0.0; extra == 'dashboard'
Requires-Dist: plotly>=5.0.0; extra == 'dashboard'
Provides-Extra: dev
Requires-Dist: mypy>=1.10.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Description-Content-Type: text/markdown

# Thoth

MCP server providing persistent codebase memory with semantic search for AI assistants.

<p align="center">
  <a href="https://pypi.org/project/mcp-server-thoth/">
    <img src="https://img.shields.io/pypi/v/mcp-server-thoth.svg" alt="PyPI">
  </a>
  <a href="https://github.com/braininahat/thoth/blob/main/LICENSE">
    <img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License">
  </a>
  <a href="https://pypi.org/project/mcp-server-thoth/">
    <img src="https://img.shields.io/pypi/pyversions/mcp-server-thoth.svg" alt="Python Versions">
  </a>
</p>

## Overview

Thoth indexes code repositories using AST parsing and provides tools for symbol lookup, cross-repository navigation, and architecture visualization. With v0.2.0, semantic search was added using local embeddings. v0.3.0 introduced **development memory** to track and learn from all coding attempts. v0.4.0 brings **architectural separation** for instant MCP server startup.

The index persists in `~/.thoth/`, giving Claude and other MCP-compatible assistants memory across conversations.

## Features

- 🚀 **Instant Startup**: MCP server starts in <1 second with separated embedding service (v0.4.0)
- 🔍 **Semantic Search**: Find code using natural language queries with local embeddings
- 🧠 **Persistent Memory**: Code understanding persists between conversations
- 📝 **Development Memory**: Track all coding attempts and learn from failures
- 🔗 **Cross-Repository**: Navigate dependencies across multiple related repositories
- 📊 **Visualizations**: Generate architecture diagrams and dependency graphs
- ⚡ **Fast Indexing**: AST-based parsing with incremental updates
- 🎯 **Precise Navigation**: Jump to exact definitions, find all callers
- 🔧 **Local-First**: All processing happens locally, no cloud dependencies

## Installation

### Requirements

- Python 3.10-3.12 (Python 3.13 not yet supported due to some dependencies)
- For semantic search: ~500MB disk space for embedding model

### Quick Start

```bash
# Build and install Thoth
uv build

# Initialize (sets up database and starts embedding server)
uv run thoth-cli init

# Source environment variables
source ~/.thoth/env

# Index your first repository
uv run thoth-cli index myproject /path/to/repo

# Add to Claude Desktop
claude mcp add thoth -s user -- uvx --python 3.12 mcp-server-thoth
```

That's it! The `init` command automatically:
- Creates the database
- Starts the Text Embeddings Inference (TEI) server for high-quality semantic search
- Sets up environment variables
- Verifies the installation

### Architecture

Thoth uses a microservices architecture for optimal performance:

- **MCP Server**: Lightweight, starts in <1 second (was 30+ seconds)
- **TEI Server**: Handles embeddings (Qwen3-Embedding-0.6B model)
- **ChromaDB Server**: Vector storage as a dedicated service

### Manual Setup (Advanced)

If you prefer to manage services manually:

```bash
# Initialize without starting services
uv run thoth-cli init --no-start-services

# Start TEI server manually
./scripts/run_tei_server.sh

# Set environment variables
export THOTH_EMBEDDING_SERVER_URL=http://localhost:8765

# Check status
uv run thoth-cli status
```

### First-Time Setup

Before using Thoth with Claude, run the initialization:

```bash
thoth-cli init
```

This will:
- ✅ Set up the database
- ✅ Create necessary directories
- ✅ Verify the installation

### Claude Desktop

Add to your configuration file:
- macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
- Windows: `%APPDATA%\Claude\claude_desktop_config.json`
- Linux: `~/.config/claude/claude_desktop_config.json`

#### Configuration:
```json
{
  "mcpServers": {
    "thoth": {
      "command": "uvx",
      "args": ["--python", "3.12", "mcp-server-thoth"]
    }
  }
}
```

To index repositories, either:
1. Use the CLI: `thoth-cli index myrepo /path/to/repo`
2. Use the `index_repository` tool from within Claude

### Command Line

```bash
# Install globally
uv tool install --python 3.12 mcp-server-thoth

# Initialize Thoth (first time only)
thoth-cli init

# Index a repository
thoth-cli index myproject /path/to/repo

# Search symbols
thoth-cli search "database connection"

# List indexed repositories
thoth-cli list

# Start MCP server
mcp-server-thoth
```

## Tools

### Core Tools
- `find_definition` - Locate symbol definitions
- `get_file_structure` - Extract functions, classes, imports from a file
- `search_symbols` - Search symbols by name pattern
- `get_callers` - Find callers of a function
- `get_repositories` - List indexed repositories
- `index_repository` - Index a new repository

### Semantic Search (v0.2.0+)
- `search_semantic` - Natural language code search using embeddings
  - Example: "function that handles user authentication"
  - Returns relevant symbols ranked by semantic similarity

### Development Memory (v0.3.0+)
- `start_dev_session` - Start tracking development attempts
  - Persists across Claude conversations
  - Links attempts to specific tasks
- `track_attempt` - Record coding attempts (edit, test, refactor)
  - Automatically captures errors and solutions
  - Builds knowledge base of what works/fails
- `check_approach` - See if an approach has been tried before
  - Learn from past attempts
  - Avoid repeating mistakes
- `analyze_failure` - Get insights from past failures
  - Find solutions to similar problems
  - See common error patterns
- `analyze_patterns` - Analyze failure patterns
  - Identify problematic files
  - Get suggestions based on history

### Visualization Tools
- `generate_module_diagram` - Generate Mermaid dependency diagrams
- `generate_system_architecture` - Visualize cross-repository relationships
- `trace_api_flow` - Trace client-server communication paths

## Architecture

### Storage Backend

Thoth uses a hybrid storage approach:
- **SQLite** (`~/.thoth/index.db`): Source of truth for structured data
  - `symbols` - Functions, classes, methods with location and parent relationships
  - `imports` - Import statements with cross-repository resolution
  - `calls` - Function call graph (caller → callee mapping)
  - `files` - File metadata and content hashes for incremental updates
  - `development_sessions` - Track coding sessions across Claude conversations
  - `development_attempts` - Record all edit/test/refactor attempts
  - `failure_patterns` - Identify common failure patterns
  - `learned_solutions` - Store successful solutions for reuse

- **ChromaDB** (`~/.thoth/chroma/`): Vector storage for semantic search
  - Stores embeddings for all indexed symbols
  - Enables natural language queries

- **NetworkX**: In-memory graph for fast relationship traversal

### Embedding Model

Semantic search uses **Qwen3-Embedding-0.6B** via vLLM:
- Lightweight (600M parameters, ~1.2GB on disk)
- Code-aware embeddings with instruction support
- Fast inference with GPU acceleration (optional)
- Falls back to TF-IDF when vLLM is unavailable

## Performance

- **Indexing**: ~10K symbols/minute
- **Semantic Search**: <100ms for typical queries
- **Memory**: ~2GB for model + ~100MB per 100K symbols
- **Accuracy**: 0.7-0.9 relevance scores for code search

## Advanced Usage

### Pre-indexing Large Repositories
For large monorepos, pre-index before adding to Claude:
```bash
thoth-cli index myrepo /path/to/large-repo
```

### Using Redis Cache (Optional)
For improved performance with multiple users:
```bash
# Install with Redis support
uv tool install "mcp-server-thoth[cache]"

# Requires Redis server running locally
```

### Dashboard (Coming Soon)
A separate `thoth-dashboard` package will provide:
- Web UI for exploring indexed code
- Interactive dependency graphs
- Real-time search interface

## Development

```bash
git clone https://github.com/braininahat/thoth
cd thoth
uv pip install -e ".[dev]"

# Run tests
pytest

# Type checking
mypy thoth
```

## Token Efficiency

Thoth dramatically reduces the tokens needed for code navigation:

**Without Thoth**: Multiple searches + reading entire files = ~50K tokens
**With Thoth**: Semantic search + precise results = ~2K tokens

Example:
```
User: "How does the dashboard update in real-time?"

Without Thoth:
- grep "dashboard" → 50 results
- grep "update" → 200 results  
- Read 10+ files to understand

With Thoth semantic search:
- Returns: WebSocketHandler.send_update(), Dashboard.subscribe_to_changes(), etc.
- Ranked by relevance
```

## Troubleshooting

### Python Version Issues
If you see errors about `xformers` or build failures:
```bash
# Ensure Python 3.12 is used
uvx --python 3.12 mcp-server-thoth
```

### GPU Memory
For systems with limited GPU memory:
- Embeddings are automatically moved to CPU after computation
- Set `CUDA_VISIBLE_DEVICES=-1` to force CPU-only mode

### Model Download
First run downloads the embedding model (~460MB). Use `thoth-cli init` to pre-download:
```bash
# Download model before using with Claude
thoth-cli init

# Or skip model download (disables semantic search)
thoth-cli init --skip-model
```

### MCP Timeouts
If tools timeout in Claude, run `thoth-cli init` first to pre-download the model. The embedding model takes time to load on first use.

## License

MIT

## Contributing

Contributions welcome! Please check the [issues](https://github.com/braininahat/thoth/issues) page.

## Acknowledgments

- [MCP](https://modelcontextprotocol.io/) by Anthropic
- [vLLM](https://github.com/vllm-project/vllm) for fast inference
- [Qwen](https://github.com/QwenLM/Qwen) for lightweight embeddings