Metadata-Version: 2.3
Name: docs-kit
Version: 0.1.1
Summary: Fetch docs, embed locally, expose via MCP for AI agents.
License: MIT License
        
        Copyright (c) 2026 Docs Kit Limited
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Requires-Python: >=3.11
Requires-Dist: click>=8.0.0
Requires-Dist: fastembed>=0.6.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: pydantic-settings>=2.2.1
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: qdrant-client>=1.10.0
Requires-Dist: tomli-w>=1.0.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: starlette>=0.36.0; extra == 'dev'
Requires-Dist: uvicorn>=0.27.0; extra == 'dev'
Provides-Extra: sse
Requires-Dist: starlette>=0.36.0; extra == 'sse'
Requires-Dist: uvicorn>=0.27.0; extra == 'sse'
Description-Content-Type: text/markdown

# docs-kit

[![PyPI version](https://img.shields.io/pypi/v/docs-kit)](https://pypi.org/project/docs-kit/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python](https://img.shields.io/pypi/pyversions/docs-kit)](https://pypi.org/project/docs-kit/)
[![CI](https://github.com/docs-kit/docs-kit/actions/workflows/ci.yml/badge.svg)](https://github.com/docs-kit/docs-kit/actions/workflows/ci.yml)

Fetch docs from GitBook, Mintlify, or local files, embed them locally, and expose retrieval to AI tools over MCP.

No API keys are required for the default local embedding path.

## What it does

- Fetches public docs from GitBook and Mintlify sites via `llms-full.txt` / `llms.txt` (with sitemap.xml fallback for Mintlify)
- Ingests local `.md` and `.txt` files
- Stores vectors in local Qdrant by default
- Serves an MCP server over `stdio` or SSE
- Exposes MCP tools to ingest, remove, and list sources at runtime
- Installs MCP config for supported AI clients

## Install

```bash
pip install docs-kit
```

Or with `npx`:

```bash
npx docs-kit ingest https://docs.example.com
```

## Quickstart

```bash
# 1. Create a config file
docs-kit init

# 2. Ingest docs (GitBook or Mintlify — auto-detected)
docs-kit ingest https://docs.elevenlabs.io

# 3. Check the collection
docs-kit inspect

# 4. Install into your client
docs-kit install claude-code
```

## Commands

### `docs-kit init`

Create `docs-kit.yaml`.

```bash
docs-kit init
docs-kit init --dir ./sandbox
```

### `docs-kit ingest <path-or-url>`

Ingest a local file, directory, or documentation URL into the vector store. Supports GitBook and Mintlify sites out of the box — auto-detected by default.

```bash
docs-kit ingest ./docs
docs-kit ingest https://docs.example.com
docs-kit ingest https://docs.mintlify-site.com --provider mintlify
docs-kit ingest https://docs.gitbook-site.com --provider gitbook
docs-kit ingest ./docs --recreate
```

`--provider` accepts `auto` (default), `gitbook`, or `mintlify`. In `auto` mode, the fetcher tries `/llms-full.txt` → `/llms.txt` → `/sitemap.xml` in order.

### `docs-kit fetch <url>`

Download GitBook docs as Markdown without ingesting them.

```bash
docs-kit fetch https://docs.example.com
docs-kit fetch https://docs.example.com --output ./downloaded-docs
```

### `docs-kit serve`

Run the MCP server. `stdio` is the default. Use SSE for HTTP clients.

```bash
docs-kit serve
docs-kit serve --transport sse --port 3001
docs-kit serve --config ./docs-kit.yaml
```

### `docs-kit install <agent>`

Install docs-kit into a supported client config.

```bash
docs-kit install claude-code
docs-kit install codex
docs-kit install claude-code --project
docs-kit install cursor --config ./docs-kit.yaml
```

### `docs-kit query <text>`

Run retrieval directly from the CLI.

```bash
docs-kit query "How do I authenticate?"
docs-kit query "getting started" --limit 3
```

### `docs-kit inspect`

Show collection and embedding configuration details.

```bash
docs-kit inspect
docs-kit inspect --config ./docs-kit.yaml
```

### `docs-kit doctor`

Check environment variables, config presence, and Qdrant connectivity.

```bash
docs-kit doctor
docs-kit doctor --config ./docs-kit.yaml
```

### `docs-kit list`

List ingested sources with their ingestion timestamps.

```bash
docs-kit list
docs-kit list --config ./docs-kit.yaml
```

### `docs-kit remove <source>`

Remove an ingested source by URL or file path.

```bash
docs-kit remove https://docs.example.com/page
docs-kit remove ./docs/getting-started.md
```

## MCP Tools

When connected to an MCP client, docs-kit exposes:

| Tool | Description |
|------|-------------|
| `search_docs(query, limit=5)` | Hybrid dense + BM25 retrieval |
| `list_sources()` | List all ingested source URLs/paths |
| `list_ingested_sources()` | List sources with ingestion timestamps |
| `get_collection_info()` | Collection stats (exists, point count) |
| `get_full_document(source)` | Retrieve full stored document by source |
| `ingest_urls(urls, provider="auto")` | Ingest comma-separated URLs at runtime |
| `remove_source(source)` | Remove a source and all its chunks |

## Install Targets

Local config install is supported for:

- `claude-code`
- `claude-desktop`
- `cursor`
- `codex`

Codex aliases:

- `codex-app`

ChatGPT aliases are accepted for guidance only:

- `chatgpt`
- `chatgpt-desktop`

`chatgpt` and `chatgpt-desktop` do not currently use a local stdio config written by this command. The installer prints guidance for the current OpenAI flow, which uses remote MCP apps/connectors in ChatGPT settings.

## Configuration

`docs-kit.yaml` created by `docs-kit init`:

```yaml
embedding:
  provider: fastembed
  model: BAAI/bge-small-en-v1.5

vector_store:
  provider: qdrant
  local_path: .docs-kit/qdrant
  collection_name: knowledge_base

ingestion:
  chunk_size: 800
  chunk_overlap: 120
  bm25_model: Qdrant/bm25

mcp:
  transport: stdio
  host: localhost
  port: 3001
```

## Supported Sources

| Source | Strategy |
|--------|----------|
| GitBook sites | `/llms-full.txt` → `/llms.txt` |
| Mintlify sites | `/llms-full.txt` → `/llms.txt` → `/sitemap.xml` |
| Local `.md` files | Direct file read |
| Local `.txt` files | Direct file read |

Both GitBook and Mintlify support the [`llms.txt` standard](https://llmstxt.org), so in most cases the same auto strategy works for both. The Mintlify fetcher adds a `/sitemap.xml` fallback for sites where `llms.txt` is disabled.

## Requirements

- Python 3.11+
- Disk space for the local embedding model download
- Local Qdrant storage under `.docs-kit/` by default
