Metadata-Version: 2.4
Name: knowledge2
Version: 0.7.0
Summary: Python SDK for the Knowledge² retrieval platform
Author-email: Knowledge2 <contact@knowledge2.ai>
License: MIT
Project-URL: Homepage, https://knowledge2.ai
Project-URL: Documentation, https://knowledge2.ai/docs
Project-URL: Repository, https://github.com/knowledge2-ai/knowledge2-python-sdk
Project-URL: Changelog, https://github.com/knowledge2-ai/knowledge2-python-sdk/blob/main/CHANGELOG.md
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: httpx>=0.27
Requires-Dist: pydantic<3,>=2
Provides-Extra: config
Requires-Dist: pydantic-settings>=2.0; extra == "config"
Provides-Extra: pydantic
Requires-Dist: pydantic<3,>=2; extra == "pydantic"
Provides-Extra: yaml
Requires-Dist: pyyaml>=6.0; extra == "yaml"

# Knowledge² Python SDK

[![PyPI version](https://img.shields.io/pypi/v/knowledge2.svg)](https://pypi.org/project/knowledge2/)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Official Python client for the Knowledge² retrieval platform. The supported customer journey is:

`create corpus -> ingest documents -> build indexes -> search -> optimize retrieval`

## Installation

From PyPI:

```bash
pip install knowledge2
pip install "knowledge2[config]"
pip install "knowledge2[yaml]"
```

From source:

```bash
pip install -e .
pip install -e ".[config]"
pip install -e ".[yaml]"
```

`pip install knowledge2` now includes the typed response model dependency
(`pydantic`) out of the box. Install `knowledge2[config]` only if you want
`K2Config` environment/file loading via `pydantic-settings`.

## Before You Start

- Use a normal org-scoped API key for the standard retrieval workflow:
  projects, corpora, documents, indexes, search, and optimize.
- `optimize_indexes()` and some enterprise/preview surfaces can return
  feature-flag or quota errors (`403`, `409`, `429`) even when the payload is
  correct. Check environment entitlements early.

## Surface Categories

| Category | Surface |
|---|---|
| Core retrieval workflow | orgs, auth, projects, corpora, documents, indexes, search, jobs, metadata, onboarding, audit, usage, console, generation models |
| Enterprise capabilities | agents, feeds, pipelines, A2A |

The main docs and examples below focus on the core retrieval workflow.

## Quick Start

```python
from sdk import Knowledge2

client = Knowledge2(api_key="k2_...")

project = client.create_project("My Project")
corpus = client.create_corpus(project["id"], "My Corpus")

batch = client.upload_documents_batch_and_wait(
    corpus["id"],
    [
        {
            "source_uri": "doc://overview",
            "raw_text": "Knowledge² builds dense and sparse indexes for hybrid retrieval.",
            "metadata": {"topic": "overview"},
        },
        {
            "source_uri": "doc://search",
            "raw_text": "Hybrid retrieval combines semantic similarity with exact keyword matching.",
            "metadata": {"topic": "search"},
        },
    ],
    auto_index=False,
)
client.sync_indexes(corpus["id"], wait=True)

results = client.search(
    corpus["id"],
    "what is hybrid retrieval",
    top_k=3,
    return_config={"include_text": True, "include_scores": True},
)

for hit in results["results"]:
    print(hit["score"], hit.get("text", "")[:80])
```

`upload_documents_batch_and_wait(...)` is the canonical onboarding helper for
raw-text JSON batch ingestion. It blocks until the batch finishes and returns
the final batch payload, including `doc_ids`.

If you intentionally want enqueue-first control, use `wait=False` and then
resolve the batch with `wait_for_document_batch(...)`:

```python
docs = [
    {
        "source_uri": "doc://overview",
        "raw_text": "Knowledge² builds dense and sparse indexes for hybrid retrieval.",
    },
]

enqueue = client.upload_documents_batch(corpus["id"], docs, wait=False)
batch = client.wait_for_document_batch(corpus["id"], enqueue["batch_id"])
print(batch["status"], batch["doc_ids"])
```

For large in-flight imports, `get_document_batch(...)` and
`wait_for_document_batch(...)` are the canonical batch APIs. Once the batch is
visible they return stable `doc_ids`, terminal resolution, and live batch
counters that track admitted documents as processing advances. For broader
operational context during a large import, you can still pair them with
`get_corpus_status(...)`, `get_job(...)`, or document-level status checks.

## Improve Retrieval Quality

```python
profile = client.get_query_profile(corpus["id"])
print(profile["example_queries"])

job = client.optimize_indexes(
    corpus["id"],
    example_queries=[
        "how does hybrid retrieval work",
        "what is bm25 tuning",
        "how does rrf combine dense and sparse search",
    ],
    query_count=25,
    top_k=10,
    metric="ndcg",
    wait=True,
)
print(job["job_id"], job["job_type"])
```

## Examples

- `sdk/examples/retrieval_quickstart.py`: minimal happy path from empty corpus to working hybrid search
- `sdk/examples/e2e_lifecycle.py`: full retrieval-quality workflow with query profile inspection and `indexes:optimize`

Run either example with:

```bash
export K2_BASE_URL=https://api.knowledge2.ai
export K2_API_KEY=<api-key>
python sdk/examples/retrieval_quickstart.py
python sdk/examples/e2e_lifecycle.py
```

## Authentication

| Method | Header | Typical use |
|---|---|---|
| API key | `X-API-Key` | primary programmatic access for retrieval workflows |
| Bearer token | `Authorization: Bearer <token>` | console / Auth0 session |

```python
client = Knowledge2(api_key="k2_...")
client = Knowledge2.from_env()
client = Knowledge2(bearer_token="...")
```

## Configuration

Important constructor knobs:

- `api_host`: defaults to `https://api.knowledge2.ai`
- `api_key`: API key for programmatic access
- `org_id`: auto-detected from `GET /v1/auth/whoami` when omitted
- `timeout`: float or `ClientTimeouts`
- `limits`: connection-pool settings via `ClientLimits`
- `max_retries`: transient retry budget
- `validate_responses`: enable Pydantic response validation
- `http_client`: bring your own `httpx.Client`

```python
from sdk import ClientTimeouts, Knowledge2

client = Knowledge2(
    api_key="k2_...",
    timeout=ClientTimeouts(connect=5, read=120, write=30, pool=10),
)
```

## Namespaces

The flat client API is canonical. The sync client also exposes namespace helpers
that group the same methods without changing behavior:

- `client.documents.*`
- `client.documents.upload_batch_and_wait(...)`
- `client.documents.wait_for_batch(...)`
- `client.corpora.*`
- `client.search_ns.*`
- `client.jobs.*`
- `client.auth.*`

`AsyncKnowledge2` currently stays flat-only.

## Framework Integrations

The SDK ships LangChain and LlamaIndex integration modules in-package. Install the framework dependency separately, then import the adapter:

```python
from sdk.integrations.langchain import K2LangChainRetriever
from sdk.integrations.llamaindex import K2LlamaIndexRetriever
```

## Enterprise Capabilities

Agents, feeds, pipelines, and A2A are available for enterprise deployments. Keep the primary examples focused on the core retrieval flow.

### Subscription Modes (Preview)

Agent-feed subscriptions support three authoring modes on `create_subscription`, gated behind the `knowledge_agents_enabled` feature flag:

| Mode | Use | Required fields |
|------|-----|-----------------|
| `always` | Route every envelope from the feed | `feed_id`, `role` |
| `explicit` | Evaluate a predicate DSL against the envelope | `feed_id`, `role`, `match_spec` |
| `nl_semantic` | Describe the match in plain English; compiled server-side into a `semantic_like` predicate against `content` | `feed_id`, `role`, `match_spec_description` (10-500 chars); optional `threshold` (default 0.75) |

The create response echoes the compiled `match_spec` and the raw `match_spec_description`, so no separate `/preview` endpoint is required:

```python
sub = client.create_subscription(
    agent_id,
    feed_id=feed_id,
    role="input",
    mode="nl_semantic",
    match_spec_description="documents about security incidents",
)
print(sub["match_spec"])            # compiled semantic_like predicate
print(sub["match_spec_description"])  # raw NL description (echoed)
```

### Feed Drafts, Subscriptions, and Feedback (Preview)

In addition to CRUD and `run_feed`, the `Knowledge2` client exposes the full
editing and feedback surface of the Feeds API as flat methods on `client`
(the same mixin-based pattern used by every other resource).

| Method | Endpoint | Notes |
|--------|----------|-------|
| `create_feed_draft(feed_id)` | `POST /v1/feeds/{id}/draft` | Returns a draft feed with `parent_feed_id` set |
| `get_feed_draft(feed_id)` | `GET /v1/feeds/{id}/draft` | 404 when no draft exists |
| `activate_feed_draft(feed_id)` | `POST /v1/feeds/{id}/draft/activate` | Returns the updated **parent** feed (draft is deleted) |
| `discard_feed_draft(feed_id)` | `DELETE /v1/feeds/{id}/draft` | Returns `None` |
| `list_feed_subscriptions(feed_id)` | Read-only view | Returns subscriptions embedded on the feed record; use `create_subscription` on the Agents mixin to attach new ones |
| `submit_feed_feedback(feed_id, *, rating, chunk_id, feed_run_id)` | `POST /v1/feeds/{id}/feedback` | `rating` is `1` (thumbs up) or `0` (thumbs down) |
| `get_feed_feedback_stats(feed_id, *, feed_run_id=None)` | `GET /v1/feeds/{id}/feedback` | Optional `feed_run_id` scopes stats to a single run |

```python
draft = client.create_feed_draft(feed_id)
client.update_feed(draft["id"], name="new name")
client.activate_feed_draft(feed_id)  # applies the draft; returns the parent

run = client.run_feed(feed_id, return_results=True)
# `results` is only populated for non-persistent feeds run with
# `return_results=True`; guard the example for safe use.
if run.get("results"):
    client.submit_feed_feedback(
        feed_id,
        rating=1,
        chunk_id=run["results"][0]["chunk_id"],
        feed_run_id=run["feed_run_id"],
    )
stats = client.get_feed_feedback_stats(feed_id)  # org-wide for this feed
```

All three areas are fully mirrored on `AsyncKnowledge2` under the same names.

## Error Handling

All SDK exceptions inherit from `Knowledge2Error`.

```python
from sdk.errors import Knowledge2Error, NotFoundError, RateLimitError

try:
    client.get_corpus("missing")
except NotFoundError:
    ...
except RateLimitError as exc:
    print(exc.retry_after)
except Knowledge2Error as exc:
    print(exc)
```
