Metadata-Version: 2.4
Name: index1
Version: 2.0.1
Summary: AI-native project knowledge base
Author: gladego
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/gladego/index1
Project-URL: Repository, https://github.com/gladego/index1
Keywords: ai,knowledge-base,mcp,semantic-search,bm25,rag
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Documentation
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx[socks]>=0.27
Requires-Dist: sqlite-vec==0.1.6
Requires-Dist: mcp>=1.0
Requires-Dist: watchdog>=4.0
Requires-Dist: click>=8.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: flask>=3.0
Requires-Dist: fastembed>=0.4
Provides-Extra: chinese
Requires-Dist: jieba>=0.42; extra == "chinese"
Provides-Extra: codegraph
Requires-Dist: tree-sitter-language-pack>=0.13; extra == "codegraph"
Requires-Dist: grep-ast>=0.9; extra == "codegraph"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: jieba>=0.42; extra == "dev"
Requires-Dist: tree-sitter-language-pack>=0.13; extra == "dev"
Requires-Dist: grep-ast>=0.9; extra == "dev"
Dynamic: license-file

# index1

[English](README.md) | [中文](README.zh-CN.md)

AI-native project knowledge base. BM25 + vector hybrid search, < 200ms response.

![index1 vs grep real-world comparison](assets/index1-ab-real.gif)

**index1 tested in real-world Claude grep!** Comparison of index1 + Claude grep vs Claude grep only:

https://github.com/user-attachments/assets/b689b0bb-b767-4fc8-9055-cc3ae872559e

## Install

**One-click** (recommended):

```bash
# macOS / Linux
curl -sSL https://raw.githubusercontent.com/gladego/index1/main/scripts/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/gladego/index1/main/scripts/install.ps1 | iex
```

The script auto-detects Python, installs via pipx, sets up Ollama, and creates default config.

**Manual install**:

```bash
pipx install index1    # recommended
# or: pip install index1
```

> **Note**: macOS blocks global `pip install` by default. Use [pipx](https://pipx.pypa.io/) instead:
> - macOS: `brew install pipx`
> - Linux: `pip install --user pipx && pipx ensurepath`
> - Windows: `scoop install pipx` or `pip install --user pipx`

## Quick Start

```bash
ollama pull nomic-embed-text      # optional, for semantic search
index1 index ./docs ./src
index1 search "how to use the liquidation API"
```

> Ollama is optional. Without it, index1 falls back to BM25-only search.

## AI Tool Integration

### Claude Code

Add `.mcp.json` to your project root:

```json
{
  "mcpServers": {
    "index1": {
      "type": "stdio",
      "command": "index1",
      "args": ["serve"]
    }
  }
}
```

Restart Claude Code — five `docs_*` tools will be available (`docs_search`, `docs_get`, `docs_status`, `docs_reindex`, `docs_config`).

> Full setup guide: [Claude Code integration](docs/integration-claude-code.md) — MCP config, search strategy, CLAUDE.md setup, context-saving tips

### Other AI Tools (OpenClaw, Cursor, Windsurf, Cline...)

**MCP-compatible tools**: Add the same config above to your tool's MCP settings.

**CLI mode** (works with any tool):

```bash
index1 search "how does authentication work"
index1 get <chunk_id>
```

> Full setup guide: [Other AI agents integration](docs/integration-other-agents.md) — per-tool config, CLI usage, Web UI

### Ollama (recommended)

```bash
# macOS
brew install ollama && ollama pull nomic-embed-text

# Linux
curl -fsSL https://ollama.ai/install.sh | sh && ollama pull nomic-embed-text

# Windows — download from https://ollama.ai/download, then:
ollama pull nomic-embed-text
```

> | Model | Dim | Disk | RAM | Best for |
> |-------|-----|------|-----|----------|
> | `all-minilm` | 384 | ~45 MB | ~250 MB | English, low-resource machines |
> | `nomic-embed-text` (default) | 768 | ~270 MB | ~500 MB | English + Chinese, general use |
> | `bge-m3` | 1024 | ~1.2 GB | ~1.2 GB | Chinese-optimized, 100+ languages |
>
> Without Ollama, index1 falls back to BM25-only search (no semantic/cross-language support).

## CLI Commands

```bash
index1 index <paths...>          # Index files/directories
index1 search <query>            # Hybrid search
index1 status                    # View index statistics
index1 config [key] [value]      # View/modify configuration
index1 serve                     # Start MCP Server (stdio)
index1 web                       # Start Web UI (port 6888)
```

## Supported File Types

`.md` `.markdown` `.py` `.rs` `.js` `.ts` `.jsx` `.tsx` `.txt`

Each type uses structure-aware chunking: headings for Markdown, AST for Python, regex patterns for Rust/JS/TS.

## Configuration

Config file: `~/.claude-index1/config.yaml`

```yaml
embedding_model: nomic-embed-text   # Ollama model
embedding_dim: 768
ollama_url: http://localhost:11434
top_k: 10                           # Results per query
collection: default                 # Namespace isolation
```

Project-level override: `.index1.yaml` in project root.

## Architecture

```
Claude Code ──► MCP Server (stdio)
                    │
CLI ────────────► Query Engine ──► SQLite
                    │               ├── FTS5 (BM25)
Web UI ─────────┘   │               └── sqlite-vec (vector)
                    │
              Ollama Embedding
```

- **Storage**: Single SQLite file (`~/.claude-index1/knowledge.db`)
- **Search**: BM25 + vector with Reciprocal Rank Fusion (k=60)
- **Chunking**: Structure-aware splitting by file type

## Performance

| Mode | Cold | Hot (cached) |
|------|------|-------------|
| Hybrid (BM25 + Vector) | 40–180 ms | < 1 ms |
| BM25-only (no Ollama) | ~35 ms* | < 1 ms |
| Grep/Glob (native) | 4 ms | N/A |

> \* After first query. First cold query without Ollama takes ~1s due to connection timeout, then result is cached for 60s.

**Without Ollama**: 6–8x slower cold start, Chinese semantic search returns **0 results**, no cross-language support.

**Context savings**: index1 returns top-k ranked results (~400–500 tokens) vs Grep returning all matches (~5,000–35,000 tokens for common keywords). Saves **90–99% of LLM context window** on broad queries.

Full benchmark and integration guides:
- [Benchmark: index1 vs native tools](docs/benchmark-vs-native-tools.md) ([中文](docs/benchmark-vs-native-tools.zh-CN.md))
- [Claude Code integration](docs/integration-claude-code.md) — MCP config, search strategy, CLAUDE.md setup, context-saving tips
- [Other AI agents integration](docs/integration-other-agents.md) — Windsurf, Cline, CLI, Web UI
- [OpenClaw integration](docs/integration-openclaw.md) | [Cursor integration](docs/integration-cursor.md)

## FAQ

**Ollama is not running / not installed?**
index1 automatically falls back to BM25-only keyword search. However, this comes with significant penalties:
- **6–8x slower** cold queries (connection timeout overhead)
- **0 results** for Chinese/Japanese/Korean semantic queries
- **No cross-language search** (Chinese query → English code)

We strongly recommend installing Ollama:

```bash
# macOS
brew install ollama
ollama pull nomic-embed-text

# Linux
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull nomic-embed-text

# Windows
# Download from https://ollama.ai/download, then:
ollama pull nomic-embed-text
```

Ollama runs locally on port 11434 (configurable). All data stays on your machine.

**Resource comparison — with vs without Ollama:**

| | Without Ollama | With Ollama (`nomic-embed-text`) |
|---|---|---|
| **Disk** | 0 | ~270 MB (model file) |
| **RAM** | 0 | ~500 MB (while running) |
| **Cold query** | ~1s (timeout) → ~35ms (cached) | 40–180 ms |
| **CJK search** | 0 results | Full semantic search |
| **Cross-language** | Not supported | Supported |
| **Search mode** | BM25 keyword only | BM25 + vector hybrid |

> Ollama only uses RAM while running. If you stop `ollama serve`, RAM is fully released. Disk usage depends on the model — `all-minilm` is only ~45 MB for machines with limited storage.

**How to switch embedding models?**

```bash
index1 config embedding_model <model-name>
index1 index --force ./docs ./src   # Rebuild index with new model
```

**Can I use multiple projects?**
Yes. Use `--collection` to isolate namespaces:

```bash
index1 index ./project-a -c proj_a
index1 index ./project-b -c proj_b
index1 search "query" -c proj_a
```

**Where is the database stored?**
Default: `~/.claude-index1/knowledge.db`. Override via `index1 config db_path /custom/path.db` or set `INDEX1_HOME` environment variable.

**Migrating from older versions?**
```bash
mv ~/.index1 ~/.claude-index1
```

**How to rebuild the index?**

```bash
index1 index --force ./docs ./src
```

**How to monitor file changes?**

```bash
index1 watch ./docs ./src
```

## Contributing

```bash
git clone https://github.com/gladego/index1.git
cd index1
pip install -e ".[dev]"
pytest
```

PRs welcome. Please ensure `pytest` passes before submitting.

## Changelog

### v0.1.0

- BM25 + vector hybrid search with RRF fusion
- Structure-aware chunking (Markdown, Python, Rust, JS/TS)
- MCP Server with 5 tools for Claude Code integration
- Web UI with Atom Core animated logo
- L1/L2 query cache (10min TTL)
- File watcher for auto-reindex
- Optional rerank with cosine similarity
- One-click install script

## Requirements

- Python >= 3.10
- macOS / Linux / Windows
- [Ollama](https://ollama.ai) (optional, for semantic search)

## License

[Apache 2.0](LICENSE)
