Metadata-Version: 2.4
Name: kompact
Version: 0.3.0
Summary: Multi-layer context optimization proxy for LLM agents
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: click>=8.1.0
Requires-Dist: fastapi>=0.115.0
Requires-Dist: httpx>=0.28.0
Requires-Dist: opentelemetry-api>=1.20.0
Requires-Dist: opentelemetry-exporter-otlp-proto-grpc>=1.20.0
Requires-Dist: opentelemetry-exporter-prometheus>=0.40b0
Requires-Dist: opentelemetry-sdk>=1.20.0
Requires-Dist: prometheus-client>=0.20.0
Requires-Dist: tiktoken>=0.8.0
Requires-Dist: uvicorn>=0.32.0
Provides-Extra: bench
Requires-Dist: context-bench>=0.1.0; extra == 'bench'
Requires-Dist: datasets>=4.5.0; extra == 'bench'
Requires-Dist: headroom-ai>=0.3.0; extra == 'bench'
Requires-Dist: llmlingua>=0.2.0; extra == 'bench'
Provides-Extra: code
Requires-Dist: tree-sitter-python>=0.23.0; extra == 'code'
Requires-Dist: tree-sitter>=0.23.0; extra == 'code'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest-httpx>=0.34.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.8.0; extra == 'dev'
Provides-Extra: schema
Requires-Dist: sentence-transformers>=3.0.0; extra == 'schema'
Description-Content-Type: text/markdown

# Kompact

[![CI](https://github.com/npow/kompact/actions/workflows/ci.yml/badge.svg)](https://github.com/npow/kompact/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/kompact.svg)](https://pypi.org/project/kompact/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/) [![Docs](https://img.shields.io/badge/docs-mintlify-18a34a?style=flat-square)](https://mintlify.com/npow/kompact)

Context compression proxy for LLM agents. Sits between your agent and the LLM provider, compresses context on the fly, and cuts your token bill 40-70% — with zero code changes.

## Save real money

For a team running 1,000 agentic requests/day with ~10K token contexts:

| Model | Without Kompact | With Kompact | Monthly Savings |
|-------|----------------:|-------------:|----------------:|
| Sonnet ($3/M) | $900/mo | $405/mo | **$495/mo** |
| Opus ($15/M) | $4,500/mo | $2,025/mo | **$2,475/mo** |
| GPT-4o ($2.50/M) | $750/mo | $338/mo | **$412/mo** |

Savings scale linearly. 10K requests/day = 10x the numbers above.

## Get started in 30 seconds

```bash
pip install kompact   # or: uv add kompact
kompact proxy --port 7878
```

```bash
export ANTHROPIC_BASE_URL=http://localhost:7878
# That's it. Your agent now uses fewer tokens.
```

No SDK changes. No prompt rewriting. Just point your base URL at the proxy.

## Quality stays intact

Evaluated on [BFCL](https://gorilla.cs.berkeley.edu/) (1,431 real API schemas) — the standard benchmark for tool-calling agents. End-to-end through Claude, scored with [context-bench](https://pypi.org/project/context-bench/).

Quality impact vs no compression (closer to 0% = better):

| Model | Kompact | [Headroom](https://github.com/headroom-ai/headroom) | [LLMLingua-2](https://github.com/microsoft/LLMLingua) |
|-------|--------:|--------:|---------:|
| Haiku | **-2.6%** | -3.0% | -23.4% |
| Sonnet | **-3.9%** | -3.5% | -20.6% |
| Opus | **-0.5%** | -0.5% | -27.3% |

Kompact and Headroom both stay within ~3% of baseline. LLMLingua-2 destroys tool schemas regardless of model (-20 to -27%).

## Compression across content types

Measured offline on 12,795 examples across 3 datasets:

| Dataset | Examples | Kompact | Headroom | LLMLingua-2 |
|---------|----------|--------:|---------:|------------:|
| BFCL (tool schemas) | 1,431 | **55.3%** | ~0% | 55.4% |
| Glaive (tool calling) | 3,959 | **56.6%** | ~0% | ~50% |
| HotpotQA (prose QA) | 7,405 | 17.9% | ~0% | 49.9% |

Headroom's SmartCrusher doesn't compress JSON — it's designed for prose. LLMLingua-2 compresses aggressively but destroys information (see quality table above).

## How it works

Kompact is a transparent HTTP proxy. It intercepts LLM API requests, compresses the context, then forwards to the provider.

```
        ┌──────────────────────────────────────────────┐
        │           Kompact Proxy (:7878)              │
        │                                              │
Agent ─>│  1. Schema Optimizer    (TF-IDF selection)   │─> LLM Provider
        │  2. Content Compressors (TOON, JSON, code)   │
        │  3. Extractive Compress (TF-IDF sentences)   │
        │  4. Observation Masker  (history mgmt)       │
        │  5. Cache Aligner       (prefix caching)     │
        │                                              │
        └──────────────────────────────────────────────┘
```

8 transforms, each targeting a different content type. The pipeline adapts automatically — short contexts get light compression, long contexts get aggressive optimization. Sub-millisecond overhead.

### Per-request control

Disable transforms for a single request without affecting other clients using the `X-Kompact-Disable` header:

```python
# Anthropic SDK
client.messages.create(..., extra_headers={"X-Kompact-Disable": "toon,code_compressor"})

# OpenAI SDK
client.chat.completions.create(..., extra_headers={"X-Kompact-Disable": "toon,code_compressor"})
```

Comma-separated transform names: `toon`, `json_crusher`, `code_compressor`, `log_compressor`, `content_compressor`, `observation_masker`, `cache_aligner`, `schema_optimizer`.

## Monitoring

Kompact exports OpenTelemetry metrics (on by default, disable with `--no-otel`). A Prometheus + Grafana stack is included:

```bash
cd monitoring
docker compose up -d
```

- **Grafana dashboard**: http://localhost:9473 (pre-built "Kompact" dashboard)
- **Prometheus**: http://localhost:9090
- **Metrics endpoint**: http://localhost:9464/metrics

The dashboard shows request rate, token savings, compression ratio, pipeline latency percentiles, and per-transform breakdowns.

## Running benchmarks

```bash
# Offline compression (no LLM calls, measures compression + needle preservation)
uv run python benchmarks/run_dataset_eval.py --dataset bfcl

# End-to-end quality (sends through proxy chain, measures LLM answer quality)
# Requires: claude-relay running on :8084, kompact on :7878
uv run python benchmarks/run_e2e_eval.py --dataset bfcl --model haiku --workers 20
```

See [`benchmarks/README.md`](benchmarks/README.md) for full methodology.

## Development

```bash
uv sync --extra dev
uv run pytest          # 48 tests
uv run ruff check src/ tests/
```

## License

MIT
