Metadata-Version: 2.4
Name: gitcontextcli
Version: 1.2.0
Summary: Compress any GitHub repo into a token-budgeted context file for LLMs
Project-URL: Homepage, https://git-context.com
Project-URL: Repository, https://github.com/RamachandraKulkarni/git-context
Author-email: Ramachandra Kulkarni <kulkarni.ramachandra.26@gmail.com>
License: MIT
Keywords: ast,cli,context,developer-tools,github,llm,tokenizer
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Documentation
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Requires-Python: >=3.11
Requires-Dist: anthropic>=0.52.0
Requires-Dist: click>=8.1.0
Requires-Dist: fastapi>=0.115.0
Requires-Dist: httpx>=0.28.0
Requires-Dist: openai>=1.30.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: python-multipart>=0.0.9
Requires-Dist: rich>=13.0.0
Requires-Dist: sse-starlette>=2.0.0
Requires-Dist: tiktoken>=0.9.0
Requires-Dist: tree-sitter-go>=0.23.0
Requires-Dist: tree-sitter-java>=0.23.0
Requires-Dist: tree-sitter-javascript>=0.23.0
Requires-Dist: tree-sitter-python>=0.23.0
Requires-Dist: tree-sitter-rust>=0.23.0
Requires-Dist: tree-sitter-typescript>=0.23.0
Requires-Dist: tree-sitter>=0.24.0
Requires-Dist: uvicorn[standard]>=0.34.0
Provides-Extra: dev
Requires-Dist: mypy>=1.10.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Provides-Extra: mcp
Requires-Dist: mcp>=1.0.0; extra == 'mcp'
Description-Content-Type: text/markdown

<div align="center">

<h1>⚡ git-context</h1>

<p><strong>Compress any GitHub repository into a single, token-budgeted context file - ready for Claude, GPT-4, Gemini, or any LLM.</strong></p>

<p>
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/Python-3.11%20%7C%203.12%20%7C%203.13-3776AB?style=flat-square&logo=python&logoColor=white" alt="Python"></a>
  <a href="https://fastapi.tiangolo.com"><img src="https://img.shields.io/badge/FastAPI-0.115%2B-009688?style=flat-square&logo=fastapi&logoColor=white" alt="FastAPI"></a>
  <a href="https://react.dev"><img src="https://img.shields.io/badge/React-19-61DAFB?style=flat-square&logo=react&logoColor=black" alt="React"></a>
  <a href="https://tree-sitter.github.io"><img src="https://img.shields.io/badge/tree--sitter-0.24%2B-4CAF50?style=flat-square" alt="tree-sitter"></a>

</p>

<p>
  <a href="#-quick-start">Quick Start</a> ·
  <a href="#-how-it-works">How It Works</a> ·
  <a href="#-usage">Usage</a> ·
  <a href="#-web-ui">Web UI</a> ·
  <a href="#-composer-multi-repo-workflow">Composer</a> ·
  <a href="#-optional-ai-features">Optional AI</a> ·
  <a href="#-mcp-server-claude-code--desktop">MCP</a> ·
  <a href="#-api">API</a> ·
  <a href="#-development">Development</a>
</p>

</div>

---

## The Problem

You find a GitHub repo. Maybe it's a library you want to use, a codebase you need to debug, or a project you're trying to understand. You open ChatGPT, Claude, or Gemini and want to ask something smart about it.

Now what?

- You can't paste the whole repo - it's too large for any context window
- You manually copy a few files - but you miss the ones that matter most
- You use a web search plugin - it skims READMEs and surfaces irrelevant snippets
- You spend 20 minutes cherry-picking files, lose context, start over

**The real problem:** LLMs are powerful at reasoning about code, but getting the *right* code into the context window is entirely manual, error-prone, and slow. Nobody has time to curate a repo before every conversation.

---

## The Solution

`git-context` automates that curation. It clones any GitHub repository, runs it through a 4-phase intelligence pipeline, and produces a **single structured Markdown file** that:

- Fits precisely within your chosen token budget (32k / 64k / 128k / 200k)
- Prioritizes the files that matter most using AST analysis and import-graph centrality
- Strips all noise - build artifacts, lock files, images, generated code, binaries
- Includes the right level of detail per file - full source for entry points, signatures for utilities, outlines for docs
- Is ready to paste directly into any LLM - no preprocessing, no chunking, no embeddings

The output isn't a raw dump. It's an **intelligently compressed representation** of the repository - architecture overview, dependency map, file tree, and ranked source content - all in one document.

---

## The Idea

You're on a GitHub repo page. Change one word in the URL:

```
https://github.com/expressjs/express
                 ↓
https://git-context.com/expressjs/express
```

The app detects the path, clones the repo, runs it through a 4-phase intelligence pipeline, and streams you a structured Markdown file that fits exactly within your LLM's context window. **No copy-paste. No forms. No API keys. No data stored.**

---

## What It Produces

```markdown
# Repository Context: express

## Meta
- Repository: https://github.com/expressjs/express
- Framework: Express.js
- Token budget: 128,000
- Files scanned: 412  →  Files analyzed: 89

## Architecture Overview
express is an Express.js project written primarily in js, ts.
The repository contains 89 analyzed files across 8 top-level modules.
Core source: 61 files. Test coverage: 14 test files.
Most-imported modules: lib, middleware, test.

## File Tree
## Module Dependency Map
## Entry Points & Configuration   ← full source
## Core Source                    ← AST-extracted signatures + bodies
## Supporting Modules             ← public API signatures only
## Test Coverage Summary          ← test names + docstrings
## Documentation Outline          ← heading structure only
```

Drop that file into any LLM and ask anything about the codebase.

---

## ✦ Features

| | Feature | Detail |
|---|---|---|
| **AST** | Tree-sitter Extraction | Real symbol parsing for Python, JS/TS, Rust, Go, Java - not regex guessing |
| **Graph** | Dependency Analysis | Import graph + in-degree centrality; high-centrality files auto-promoted |
| **Budget** | Token Control | 32k / 64k / 128k / 200k - mathematically bounded, never overflows |
| **Tiers** | 6-Level Classification | Files ranked T0→T5; budget allocated proportionally per importance |
| **Detection** | Framework Recognition | Reads entry files to detect 25+ frameworks (Next.js, FastAPI, Rails, Spring…) |
| **Languages** | 15+ Supported | AST extractors for major languages, regex fallback for everything else |
| **Streaming** | Live SSE Progress | Phase-by-phase progress streamed from server to browser in real time |
| **Privacy** | Zero Storage | Repos cloned to temp dirs, deleted immediately after processing |
| **Portable** | CLI + Web + API | Terminal tool, browser UI, and REST endpoint - all from one `pip install` |
| **Composer** | Multi-repo workflow | Attach several GitHub repos, inspect their estimated bundle load, and generate one shared-budget markdown context via `/api/composer/bundle` |
| **Repository bag** | Lightweight attach flow | `POST /api/composer/inspect` returns file-count and token-load estimates before you start the full compose run |
| **Model registry** | Budget ↔ LLM alignment | Built-in profiles (context window + recommended token budget); generator can target a model so output fits how you will use it |
| **Optional AI** | Chat & reports | Ask questions about generated context, stream architecture reports - **opt-in**, requires API keys; core pipeline stays offline |
| **MCP** | Claude Code / Desktop | Optional `pip install ".[mcp]"` - expose `analyze_repo`, `ask_codebase`, resources like `codebase://context/...` |
| **Themes** | Web UI | Four themes (Terminal Dark/Light + Claude-inspired light/dark), Zinc Terminal design tokens, responsive layout |

---

## Pipeline Architecture

```
  GitHub URL
      │
      ▼
  ╔═══════════════════════════════════════════════════════╗
  ║  PHASE 1  ·  Clone & Scan                            ║
  ║                                                       ║
  ║  git clone --depth 1 --single-branch                 ║
  ║  → Walk filesystem → FileInfo(path, ext, size)       ║
  ║  → Detect entry points (main.py, index.js, go.mod…)  ║
  ║  → Measure language breakdown by byte count          ║
  ╚═══════════════════════════════╤═══════════════════════╝
                                  │  CloneScanResult
                                  ▼
  ╔═══════════════════════════════════════════════════════╗
  ║  PHASE 2  ·  Filter & Classify                       ║
  ║                                                       ║
  ║  Strip:  node_modules, dist, build, __pycache__,     ║
  ║          binaries, images, lock files, minified JS   ║
  ║          (40+ exclusion patterns, 40+ dir names)     ║
  ║                                                       ║
  ║  Rank remaining files into tiers:                    ║
  ║   T0 ████████░░  Entry points, README, configs  10%  ║
  ║   T1 ████████████████████████████  Core src     55%  ║
  ║   T2 ██████░░░░  Utils, helpers, middleware     15%  ║
  ║   T3 ██░░░░░░░░  Tests & specs                  5%  ║
  ║   T4 ██░░░░░░░░  Docs & markdown                5%  ║
  ║   T5 ░░░░░░░░░░  Generated / migrations     skipped  ║
  ╚═══════════════════════════════╤═══════════════════════╝
                                  │  FilterResult
                                  ▼
  ╔═══════════════════════════════════════════════════════╗
  ║  PHASE 3  ·  Parse & Extract                         ║
  ║                                                       ║
  ║  Per-tier extraction strategy:                       ║
  ║   T0 → verbatim source                               ║
  ║   T1 → docstring + imports + class/fn signatures     ║
  ║         + decorators + route handlers + small bodies ║
  ║   T2 → public API signatures only                    ║
  ║   T3 → test names + docstrings (no bodies)           ║
  ║   T4 → heading outline only                          ║
  ║   T5 → skip                                          ║
  ║                                                       ║
  ║  Build import graph → in-degree centrality           ║
  ║  Promote T2 files with centrality ≥ 0.5 → T1        ║
  ╚═══════════════════════════════╤═══════════════════════╝
                                  │  extractions + dep_graph
                                  ▼
  ╔═══════════════════════════════════════════════════════╗
  ║  PHASE 4  ·  Assemble Context                        ║
  ║                                                       ║
  ║  Sort each tier by centrality (hottest files first)  ║
  ║  Fill token budget section by section                ║
  ║  Detect framework from entry-point content           ║
  ║  Render structured Markdown document                 ║
  ╚═══════════════════════════════╤═══════════════════════╝
                                  │
                                  ▼
                        repo-context.md
                   (fits your token budget)
```

---

## Quick Start

**Requirements:** Python 3.11+, `git` on PATH

```bash
# 1. Clone and install
git clone https://github.com/RamachandraKulkarni/git-context.git
cd git-context
pip install -e .

# 2. Generate a context file
git-context generate https://github.com/expressjs/express -o express.md

# 3. Or launch the web UI
git-context serve
# → open http://localhost:3000
```

Optional: copy `.env.example` → `.env` and add `ANTHROPIC_API_KEY`, `MINIMAX_API_KEY`, `MINIMAX_TOKEN_PLAN_API_KEY`, or `NVIDIA_API_KEY` if you want **chat**, **architecture reports**, or AI-assisted compose planning. Set `GIT_CONTEXT_AI_PROVIDER=nvidia` or `GIT_CONTEXT_WORKER_PROVIDER=nvidia` to run swarm workers on NVIDIA Build/NIM. Optional MCP integration: `pip install -e ".[mcp]"`.

---

## Usage

### CLI

```bash
git-context generate <repo_url> [options]
```

| Option | Default | Description |
|:---|:---:|:---|
| `--budget` / `-b` | `128000` | Token budget: `32000` `64000` `128000` `200000` |
| `--output` / `-o` | stdout | Write output to a file path |
| `--no-tests` | - | Exclude test files from output |
| `--no-docs` | - | Exclude documentation files |
| `--verbose` / `-v` | - | Print detailed phase-by-phase logs |
| `--quiet` / `-q` | - | Suppress all output except the result |

**Examples:**

```bash
# Slim context for a quick chat
git-context generate https://github.com/tiangolo/fastapi \
  --budget 32000 --no-tests --no-docs --output fastapi.md

# Verbose mode to debug pipeline behaviour
git-context generate https://github.com/rust-lang/cargo --verbose

# Pipe straight into pbcopy / xclip
git-context generate https://github.com/pallets/flask -q | pbcopy
```

---

## Web UI

```bash
git-context serve              # default: http://localhost:3000
git-context serve --port 8080  # custom port
```

**URL shortcut** - works locally the same way it works on the hosted site:

```
github.com/owner/repo   →   localhost:3000/owner/repo
```

Navigate to any `/{owner}/{repo}` path and generation starts automatically. Browser history (back/forward) is fully supported.

---

## Composer (multi-repo workflow)

**Compose** is the multi-repository workflow in the web app: use the **Compose** control in the header or open [`/compose`](http://localhost:3000/compose) locally. In v1.1 it builds one shared-budget markdown context file from all attached repositories.

| Step | What happens |
|:---|:---|
| **Inspect repo** | `POST /api/composer/inspect` validates a GitHub URL and returns an estimated file count, primary language, and expected context load for the repository bag. |
| **Analyze repos** | `POST /api/composer/bundle` reuses the offline clone → filter → parse → assemble pipeline for each attached repo. Temp dirs are deleted after the request. |
| **Allocate budget** | Composer weights each repo by estimated context load and divides the requested token budget across the attached repos. |
| **Bundle output** | The final SSE `complete` event returns one markdown document plus aggregate stats and per-repo token usage. |

Composer v1.1 is local-only. LLM-backed features remain opt-in elsewhere in the app (see [Optional AI features](#optional-ai-features)).

---

## Optional AI features

The **clone → filter → parse → assemble** pipeline does **not** require any API keys or network calls to model providers. Optional features do:

| Feature | Endpoint(s) | Notes |
|:---|:---|:---|
| **Model-aware budgets** | `GET /api/models`, `GET /api/stream?...&model=` | Profiles include recommended token budgets aligned to common models; `/api/stream` can take a `model` query parameter so generation matches your downstream LLM. |
| **Chat about context** | `POST /api/chat/stream` | Streams answers over the compressed markdown (uses in-memory cache from a prior `/api/stream` run, or explicit `context` in the body). |
| **Architecture report** | `POST /api/architecture/stream` | Streams an architecture-style narrative from the same context. |
| **Provider status** | `GET /api/claude/status` | Quick key detection; add `?validate=true` for a live API probe. |

Set **`ANTHROPIC_API_KEY`**, **`MINIMAX_API_KEY`**, **`MINIMAX_TOKEN_PLAN_API_KEY`**, or **`NVIDIA_API_KEY`** in a `.env` file (see `.env.example`). Set **`GIT_CONTEXT_AI_PROVIDER=nvidia`** or **`GIT_CONTEXT_WORKER_PROVIDER=nvidia`** to use NVIDIA Build/NIM for orchestration or swarm workers. Optional **`OPENAI_API_BASE`** still supports OpenAI-compatible gateways. Incoming content can be **scanned for secrets** before model calls, with findings emitted as SSE events where applicable.

---

## MCP server (Claude Code & Desktop)

For [Model Context Protocol](https://modelcontextprotocol.io) clients (Claude Code, Claude Desktop, and others):

```bash
pip install -e ".[mcp]"
git-context mcp
# or: python -m git_context.mcp_server
```

Exposes tools such as **`analyze_repo`**, **`ask_codebase`**, **`get_repo_stats`**, and **`list_repos`**, plus resources like **`codebase://context/{owner}/{repo}`** and **`codebase://tree/{owner}/{repo}`**. Example for Claude Code: `claude mcp add git-context -- git-context mcp`.

---

## Web UI: themes and layout

- **Context vs Compose** - **Context** is the single-repo URL flow (`/{owner}/{repo}`); **Compose** is the multi-repo composer at `/compose`.
- **Themes** - Four themes (Terminal Dark/Light and Claude-inspired light/dark) via the header control; CSS variables (`--gc-*`) keep the UI consistent.
- **Ethics** - Shield icon opens an in-app **Ethics** page (privacy and responsible use).
- **Responsive** - Layout, model selector, and controls work on smaller viewports and touch targets.
- **Landing** - Two-column editorial layout on large screens; footer includes copyright and team contact on the main generator view.

---

## API

The server exposes the main generation stream, optional AI and Composer streams (all SSE where noted), model metadata, plus a health check.

### `GET /api/stream`

Streams pipeline progress and the final result as [Server-Sent Events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events).

**Query parameters**

| Parameter | Type | Required | Description |
|:---|:---:|:---:|:---|
| `repo_url` | string | ✓ | Full GitHub HTTPS URL |
| `model` | string | - | Registered profile key (e.g. `gpt-4o`, `claude-sonnet`). When set and not `custom`, **recommended_budget** from that profile overrides `budget` so output fits the target LLM. |
| `budget` | integer | - | Token budget (default `128000`; ignored when `model` selects a fixed profile) |
| `include_tests` | boolean | - | Include test files (default `true`) |
| `include_docs` | boolean | - | Include docs (default `true`) |

**Event stream**

```
event: phase
data: {"phase": "clone|filter|parse|assemble", "message": "...", "progress": 0.35}

event: complete
data: {
  "content": "# Repository Context: ...",
  "stats": {
    "total_files": 412,
    "filtered_files": 89,
    "actual_tokens": 118432,
    "compression_ratio": 8.3,
    "processing_time_ms": 4201
  }
}

event: error
data: {"message": "Repository too large (> 500 MB)"}
```

**curl example**

```bash
curl -N --header "Accept: text/event-stream" \
  "http://localhost:3000/api/stream?repo_url=https://github.com/expressjs/express&budget=64000"
```

### `GET /api/models`

Returns registered LLM **profiles** (context window, recommended budget, provider), **category** groupings for the UI, and the **default** model key.

### Composer (`/api/composer/*`)

All Composer routes use **Server-Sent Events** unless noted.
GitHub sign-in is required for Composer endpoints. The production quota is
**5 compose sessions per account every 48 hours**.

| Method | Path | Description |
|:---|:---|:---|
| `GET` | `/api/composer/status` | Composer feature availability |
| `POST` | `/api/composer/inspect` | Body: `{ "repo_url": "https://github.com/...", "token_budget": 128000 }` - lightweight repo summary for the repository bag |
| `POST` | `/api/composer/bundle` | Body: `{ "repo_urls": [...], "token_budget": 128000, "include_tests": true, "include_docs": true }` - streams the multi-repo bundle run |

### Optional AI (SSE)

Requires API keys. See [Optional AI features](#optional-ai-features).
GitHub sign-in is required. The production quotas are **5 Ask Context Engine
calls per account every 24 hours** and **5 Architecture calls per account
every 24 hours**.

| Method | Path | Description |
|:---|:---|:---|
| `GET` | `/api/claude/status` | Provider readiness; `?validate=true` runs a live check |
| `POST` | `/api/chat/stream` | Chat about generated context (JSON body: `repo_url`, `messages`, optional `context`) |
| `POST` | `/api/architecture/stream` | Stream an architecture report (JSON body: `repo_url`, optional `context`) |

### `GET /api/health`

```json
{ "status": "ok", "version": "0.1.0" }
```

---

## Language Support

<table>
<thead>
<tr><th>Language</th><th>Extensions</th><th>Extractor</th></tr>
</thead>
<tbody>
<tr><td><strong>Python</strong></td><td><code>.py</code> <code>.pyi</code></td><td>tree-sitter - classes, functions, decorators, FastAPI/Flask routes</td></tr>
<tr><td><strong>JavaScript</strong></td><td><code>.js</code> <code>.jsx</code> <code>.mjs</code> <code>.cjs</code></td><td>tree-sitter - functions, classes, named exports</td></tr>
<tr><td><strong>TypeScript</strong></td><td><code>.ts</code> <code>.tsx</code> <code>.mts</code> <code>.cts</code></td><td>tree-sitter - interfaces, type aliases, classes</td></tr>
<tr><td><strong>Rust</strong></td><td><code>.rs</code></td><td>tree-sitter - structs, impls, fns, traits, pub items</td></tr>
<tr><td><strong>Go</strong></td><td><code>.go</code></td><td>tree-sitter - funcs, types, interfaces, methods</td></tr>
<tr><td><strong>Java</strong></td><td><code>.java</code></td><td>tree-sitter - classes, methods, annotations</td></tr>
<tr><td>Ruby, C, C++, C#, Swift, Kotlin, PHP, Scala, Lua, Shell</td><td>various</td><td>Regex signature extraction</td></tr>
<tr><td>YAML, TOML, JSON, SQL, GraphQL, HCL, Terraform, Markdown</td><td>various</td><td>Full text, budget-capped</td></tr>
</tbody>
</table>

---

## Token Budget Allocation

Every run distributes the total budget across sections. The most critical content always wins space first.

```
T0  Entry Points & Config    ████████░░░░░░░░░░░░  10%  verbatim source
T1  Core Source              ██████████████████░░  55%  AST signatures + bodies
T2  Supporting Modules       ██████░░░░░░░░░░░░░░  15%  public API only
T3  Test Coverage            ██░░░░░░░░░░░░░░░░░░   5%  names + docstrings
T4  Documentation            ██░░░░░░░░░░░░░░░░░░   5%  heading outline
    Meta + Tree + Dep Map    ░░░░░░░░░░░░░░░░░░░░   5%  always included
    Buffer                   ░░░░░░░░░░░░░░░░░░░░   5%  headroom

If a tier has zero files → its allocation rolls into T1.
Files within each tier are sorted by import centrality (hottest first).
```

---

## Project Structure

```
git-context/
├── git_context/               Python package
│   ├── cli.py                 Click CLI: generate · serve · version
│   ├── server.py              FastAPI app: stream, models, optional AI, static SPA
│   ├── config.py              Language map, LLM profiles, budget presets, limits
│   ├── ai_service.py          Optional Anthropic / MiniMax / NVIDIA / OpenAI helpers
│   ├── mcp_server.py          Optional MCP entry (`pip install ".[mcp]"`)
│   │
│   ├── composer/              Multi-repo composer (stateless API + orchestration)
│   │   ├── routes.py          /api/composer/* - analyze, upload-analyze, chat, compose
│   │   ├── catalog.py         Workspace / repo catalogs
│   │   ├── chat.py            Intent + chat streaming
│   │   └── …                  orchestrator, merge, conflicts, selector, session helpers
│   │
│   ├── pipeline/
│   │   ├── orchestrator.py    Wires all phases → SSE event generator
│   │   ├── clone.py           Phase 1: async shallow git clone + file scan
│   │   ├── filter.py          Phase 2: exclusion rules + tier classification
│   │   ├── graph.py           Phase 3b: import graph + centrality scoring
│   │   ├── assemble.py        Phase 4: token-budgeted markdown assembly
│   │   ├── models.py          FileInfo · FileTier · AssemblyResult · …
│   │   └── errors.py          GitContextError
│   │
│   ├── extractors/
│   │   ├── base.py            BaseExtractor ABC + tier-dispatch logic
│   │   ├── python_ext.py      tree-sitter Python
│   │   ├── javascript_ext.py  tree-sitter JS / TS
│   │   ├── rust_ext.py        tree-sitter Rust
│   │   ├── go_ext.py          tree-sitter Go
│   │   ├── java_ext.py        tree-sitter Java
│   │   └── generic_ext.py     Regex fallback (all other languages)
│   │
│   ├── utils/
│   │   ├── tokens.py          tiktoken cl100k_base counter (cached)
│   │   ├── tree.py            ASCII file-tree renderer (max depth 3)
│   │   ├── secrets.py         Secret scanning for optional AI paths
│   │   └── cleanup.py         Temp directory lifecycle management
│   │
│   └── web/static/            Compiled React SPA - served by FastAPI
│
├── web/                       React + Vite frontend source
│   ├── src/
│   │   ├── App.tsx            URL-path detection + popstate routing
│   │   ├── stores/
│   │   │   └── useGeneratorStore.ts   Zustand + EventSource SSE client
│   │   └── components/
│   │       ├── generator/     URLInput · BudgetSelector · OptionsPanel
│   │       │                  GenerateButton · ProgressStream · StatsCard
│   │       │                  OutputPreview (tabs: Preview / Raw)
│   │       └── ui/            shadcn/ui - Button · Tabs · Progress · …
│   ├── vite.config.ts         Builds → ../git_context/web/static/
│   └── package.json
│
├── tests/
│   ├── test_filter.py         Tier classification unit tests
│   ├── test_extractors.py     Per-language AST extraction tests
│   ├── test_assemble.py       Budget allocation + assembly tests
│   ├── test_clone.py          Clone validation tests
│   └── …                      composer, catalog, conflicts, secrets, semantic, etc.
│
└── pyproject.toml             hatchling build · deps · entry point
```

---

## Development

```bash
# Install with dev extras
pip install -e ".[dev]"

# Run the test suite
pytest

# Lint and auto-fix
ruff check . --fix
ruff format .
```

**Frontend dev loop** (hot-reload UI against live backend):

```bash
# Terminal 1 - Python backend on :3000
git-context serve

# Terminal 2 - Vite dev server (proxies /api → :3000)
cd web && npm run dev     # → http://localhost:5173
```

**Rebuild frontend for production:**

```bash
cd web && npm run build   # outputs to ../git_context/web/static/
```

---

## Safety Limits

| Guard | Value |
|:---|---:|
| Max repository size | 500 MB |
| Max single file size | 1 MB |
| Max files after filtering | 5,000 |
| Git clone timeout | 60 s |
| Full pipeline timeout | 300 s |
| Temp directory lifetime | Deleted immediately on completion |

---

## Roadmap

- [ ] Private repository support via GitHub PAT
- [ ] GitLab / Bitbucket / any public git remote
- [ ] `--branch` / `--tag` flag to target non-default refs
- [ ] Incremental updates - diff-based re-generation
- [ ] Output format adapters: XML (Claude), plain text, JSON
- [ ] Cloud-hosted at `git-context.com`

---

## Contributing

PRs are welcome. For significant changes please open an issue first to discuss.

```bash
git checkout -b feat/your-feature
# ... make changes, add tests ...
pytest
# open a pull request
```

---

## The Rant That Started This

> *Three engineers. One shared grievance. Zero patience for copy-pasting files into an LLM one at a time.*

It started with a very specific kind of frustration - the kind where you know *exactly* what you want to ask an AI, you know the answer is buried somewhere in a 300-file repository, and your only realistic options are:

**Option A:** Manually open each file, copy-paste the relevant bits into the chat window, pray you didn't miss the one import that makes everything make sense, spend 25 minutes curating context that the AI will half-understand anyway because you forgot `config.py` existed.

**Option B:** Give up and read the docs like a person from 2019.

Neither was acceptable.

---

### The Provocation - Ramachandra Kulkarni

Rama was the first to articulate the problem as a *missing URL pattern*. The observation went something like: *"Why isn't there a tool where you change one word in a GitHub URL and just get the whole repo, intelligently compressed, ready for an LLM?"*

Annoyingly simple. Nobody had built it. That was the provocation.

The idea that `git-context.com/expressjs/express` should just *work* - no forms, no accounts, no five-step setup - was the design constraint that everything else had to fit inside. If it wasn't that frictionless, it wasn't worth building.

---

### The Constraint - Arun Basavaraj Alur

Arun immediately pointed out that any *naive* solution would be worse than useless.

A 300-file repo dumped verbatim into a context window doesn't help anyone. The LLM drowns in `node_modules`, auto-generated migrations, minified bundles, lock files, and test fixtures - and never finds `main.py`. Token budget exhausted. Answer quality: garbage.

The tool couldn't just *concatenate files*. It had to be **opinionated** about what matters. That constraint shaped the entire four-phase pipeline: AST parsing to understand structure, import graph centrality to rank files by how much the rest of the codebase actually depends on them, tiered extraction strategies so entry points get full source while utilities get signatures only.

The compression had to be *intelligent* or it was just `cat *.py` with extra steps.

---

### The Line That Couldn't Be Crossed - Harin Kumar Mallela

Harin made the case - early and firmly - that AI features had to be **opt-in or the whole thing was a liability**.

Any tool that automatically ships your codebase to a third-party AI API isn't a developer tool. It's a data pipeline you didn't agree to run. It doesn't matter how good the AI output is; if you can't look a developer in the eye and say *"nothing left your machine without you explicitly clicking something"*, you've built something you shouldn't ship.

That conversation produced the design principle that everything else in the project works around:

> **Privacy by default. AI by choice.**

The core pipeline - clone, filter, parse, assemble - has zero AI dependencies. No API calls. No external services. It's pure static analysis that runs entirely on your machine or our server. AI features exist as an opt-in layer on top, clearly labelled, with secrets scanned and redacted before every single call.

Not because it was easier to build that way. Because it was the only way to build it and still be able to look at it.

---

### The Result

A tool that has **strong opinions** about which files matter (AST + import centrality scoring), **strong opinions** about what AI should never see without explicit permission (secrets, opt-in everything), and **zero opinions** about which LLM you use - because the output is just Markdown and it works everywhere.

Built by three people who were tired of Option A and Option B.

---

<div align="center">
<br/>
<sub>Built with <a href="https://tree-sitter.github.io">tree-sitter</a> · <a href="https://fastapi.tiangolo.com">FastAPI</a> · <a href="https://react.dev">React</a> · <a href="https://github.com/openai/tiktoken">tiktoken</a></sub>
<br/><br/>
<sub><strong>Ramachandra Kulkarni · Arun Basavaraj Alur · Harin Kumar Mallela</strong></sub>
</div>
