Metadata-Version: 2.4
Name: semanticscout
Version: 3.3.3
Summary: A language-aware semantic code search MCP server with intelligent filtering and 9.3x better dependency analysis
Author-email: Psynosaur <psynosaur@gmail.com>
Maintainer-email: Psynosaur <psynosaur@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/Psynosaur/SemanticScout
Project-URL: Repository, https://github.com/Psynosaur/SemanticScout.git
Project-URL: Issues, https://github.com/Psynosaur/SemanticScout/issues
Project-URL: Documentation, https://github.com/Psynosaur/SemanticScout#readme
Keywords: mcp,context-engine,code-search,vector-database,ai-agents,semantic-search,ollama,chromadb
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mcp[cli]>=1.2.0
Requires-Dist: chromadb>=0.4.0
Requires-Dist: openai>=1.0.0
Requires-Dist: tree-sitter<0.22,>=0.20.0
Requires-Dist: tree-sitter-languages>=1.10.0; python_version < "3.13"
Requires-Dist: httpx>=0.25.0
Requires-Dist: pathspec>=0.11.0
Requires-Dist: watchdog>=3.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: networkx>=3.2.1
Requires-Dist: psutil>=5.9.0
Requires-Dist: sentence-transformers>=2.2.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-mock>=3.11.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-benchmark>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.5.0; extra == "dev"
Provides-Extra: fast
Requires-Dist: sentence-transformers>=2.2.0; extra == "fast"
Dynamic: license-file

# SemanticScout 🔍
## Please note: this is just an idea project to try and build something for use in non Augment Code world 
I have yet to refactor lots of slop, and implement a bunch of key changes

> Language-aware semantic code search for AI agents withdependency analysis

[![Version](https://img.shields.io/badge/version-3.1.5-blue)]()
[![Tests](https://img.shields.io/badge/tests-passing-brightgreen)]()
[![Coverage](https://img.shields.io/badge/coverage-55%25-green)]()
[![Python](https://img.shields.io/badge/python-3.10+-blue)]()
[![License](https://img.shields.io/badge/license-MIT-blue)]()

**SemanticScout** is a Model Context Protocol (MCP) server that provides intelligent code search for AI agents. It combines semantic search with language-aware analysis to understand code relationships, dependencies, and architecture.

## ✨ Key Features

- 🎯 **Language-Aware Analysis** - Automatic language detection with specialized dependency analysis (Rust, C#, Python, etc.)
- 🔍 **Semantic Code Search** - Natural language queries with 100% accuracy and intelligent context expansion
- 🚫 **Smart Test Filtering** - Automatically excludes test files (0% test pollution) with multi-strategy detection
- 🗂️ **Git Integration** - Smart filtering of untracked files and incremental indexing (5-10x faster updates)
- 🧠 **Hybrid Retrieval** - Combines semantic, symbol, and dependency-based search with AST parsing
- ⚡ **High Performance** - Local embeddings (sentence-transformers), <100ms queries, <2s per file indexing
- 🌐 **Multi-Language** - TypeScript, JavaScript, Python, Java, C#, Go, Rust, Ruby, PHP, C, C++
- 🤖 **MCP Ready** - Works with Claude Desktop and other MCP clients out of the box

## 🚀 Quick Start

Get started in **under 2 minutes** with zero configuration required!

### Prerequisites

- **uv** - [Install uv](https://docs.astral.sh/uv/getting-started/installation/)
- **Claude Desktop** - [Install Claude Desktop](https://claude.ai/download)

### Setup

1. **Configure Claude Desktop** - Add to your MCP configuration file:

**Windows:** `%APPDATA%\Claude\claude_desktop_config.json`
**Mac:** `~/Library/Application Support/Claude/claude_desktop_config.json`

```json
{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"]
    }
  }
}
```

2. **Restart Claude Desktop** - SemanticScout will be automatically downloaded and ready to use!

**✨ What you get:**
- Language-aware analysis with automatic project detection
- Fast local embeddings (sentence-transformers, no Ollama needed)
- Smart test file filtering and git integration
- All data stored in `~/semanticscout/`

> **Note:** Use Python 3.12 for best compatibility. Some dependencies don't yet support Python 3.13.

## 📖 Usage

Once configured, use natural language to interact with SemanticScout through Claude:

### Example Conversations

**Index a codebase:**
```
You: "Index my codebase at /workspace"
Claude: [Calls index_codebase tool and shows indexing progress]
```

**Search for code:**
```
You: "Find the authentication logic"
Claude: [Calls search_code tool and shows relevant code snippets]
```

**Advanced queries:**
```
You: "Show me dependency injection configuration"
Claude: [Automatically detects architectural query and expands coverage]
```

### Available Tools

| Tool | Description | Key Parameters |
|------|-------------|----------------|
| `index_codebase` | Index a codebase with language-aware analysis | `path`, `incremental` |
| `search_code` | Search with natural language + smart filtering | `query`, `collection_name`, `exclude_test_files` |
| `find_symbol` | Find symbols with language-aware lookup | `symbol_name`, `collection_name` |
| `trace_dependencies` | Trace dependency chains | `file_path`, `collection_name`, `depth` |
| `list_collections` | List all indexed codebases | None |

### Advanced Features

- **Incremental Indexing**: Use `incremental=True` for 5-10x faster updates on existing codebases
- **Test Filtering**: Set `exclude_test_files=False` to include test files in search results
- **Coverage Modes**: Use `coverage_mode` for different result depths (focused/balanced/comprehensive/exhaustive)
- **Real-time Updates**: Process file change events from editors automatically

## 🔧 Configuration

### Default Setup (Recommended)
The default configuration works great for most users - no additional setup needed!

### Custom Embedding Models
To use a different sentence-transformers model:

```json
{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"],
      "env": {
        "SEMANTICSCOUT_CONFIG_JSON": "{\"embedding\":{\"provider\":\"sentence-transformers\",\"model\":\"all-mpnet-base-v2\"}}"
      }
    }
  }
}
```

### Ollama (Optional - GPU Acceleration)
For GPU acceleration with Ollama:

```bash
# Start Ollama and pull model
ollama serve
ollama pull nomic-embed-text
```

```json
{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_MODEL": "nomic-embed-text",
        "SEMANTICSCOUT_CONFIG_JSON": "{\"embedding\":{\"provider\":\"ollama\"}}"
      }
    }
  }
}
```

## 🐛 Troubleshooting

### Common Issues

**Python Version Error:** Use Python 3.12 for best compatibility (some dependencies don't support 3.13 yet)

**Ollama Not Available:** The default uses sentence-transformers (no Ollama needed). Only configure Ollama if you want GPU acceleration.

**Rate Limits:** Adjust limits with environment variables:
```json
"env": {
  "MAX_INDEXING_REQUESTS_PER_HOUR": "20",
  "MAX_SEARCH_REQUESTS_PER_MINUTE": "200"
}
```

## 📚 Documentation

- **[API Reference](docs/API_REFERENCE.md)** - Complete tool documentation
- **[User Guide](docs/USER_GUIDE.md)** - Examples and best practices
- **[Configuration](docs/CONFIGURATION.md)** - Advanced configuration options
- **[Performance Tuning](docs/PERFORMANCE_TUNING.md)** - Optimization guide

## 🏗️ Architecture

SemanticScout combines multiple technologies for intelligent code search:

- **Language Detection** → **AST Parsing** (tree-sitter) → **Symbol Extraction**
- **Semantic Chunking** → **Embeddings** (sentence-transformers/Ollama) → **Vector Storage** (ChromaDB)
- **Dependency Analysis** → **Graph Storage** (NetworkX) → **Symbol Tables** (SQLite)
- **Hybrid Search** → **Context Expansion** → **Smart Filtering**

## 🤝 Contributing

Contributions welcome! See our [contributing guide](CONTRIBUTING.md) for details.

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request

## 📄 License

MIT License - see [LICENSE](LICENSE) for details.

---

**Built with ❤️ for the AI agent ecosystem**

