Metadata-Version: 2.4
Name: langchain-plainid
Version: 1.0.2
Summary: LangChain integration for PlainID authorization
Author: PlainID
License-Expression: MIT
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: <=3.12,>=3.10
Requires-Dist: core-plainid<2.0.0,>=1.1.0
Requires-Dist: langchain-community<1.0.0,>=0.4.1
Requires-Dist: langchain-core<2.0.0,>=1.2.18
Description-Content-Type: text/markdown

# langchain-plainid

[PlainID](https://www.plainid.com/) authorization integration for [LangChain](https://www.langchain.com/). Provides LangChain Runnables for prompt categorization, text anonymization, and policy-based document retrieval across multiple vector stores.

This library depends on [core-plainid](https://pypi.org/project/core-plainid/) for the underlying authorization components (permissions provider, categorizer, anonymizer, PlainID clients, exceptions, etc.). Please refer to the **core-plainid README** for details on setting up those components.

All components fully support both **synchronous** and **asynchronous** execution.

## Installation

```bash
pip install langchain-plainid
```

`core-plainid` is installed automatically as a dependency.

## Passing Request Context

All runnables receive the `RequestContext` through LangChain's `configurable` mechanism. You pass it once and it flows through the entire chain. Alternatively, `request_context` can be provided at construction time to the underlying components (e.g. `PlainIDPermissionsProvider`, `FilterDirectiveProvider`) — see the core-plainid README for details.

```python
from core_plainid.models.context.request_context import AdditionalIdentity, RequestContext

request_context = RequestContext(
    entity_id="your_entity_id",
    entity_type_id="your_entity_type",
    additional_identities=[
        AdditionalIdentity(
            entity_id="your_additional_entity_id",
            entity_type_id="your_additional_entity_type",
        ),
    ],
)

config = {"configurable": {"request_context": request_context}}

result = await runnable.ainvoke("your input", config=config)
```

Multiple identities (e.g. a User and an AI Agent) are supported for agentic scenarios through the `additional_identities` field in `RequestContext`. Identity can also be resolved via HTTP headers that are matched against configured values in PlainID — see the **Identity Context** section in the core-plainid README for details.

All core-plainid components constructed in this library (e.g. `PlainIDPermissionsProvider`, `FilterDirectiveProvider`) support three authentication modes: client credentials, per-request JWT token, and automatic IDP token management via `IdpAuthProvider` — see the **Authentication** section in the core-plainid README.

## Categorization Runnable

The `CategorizationRunnable` wraps the core-plainid `Categorizer` as a LangChain `Runnable[str, str]`. It classifies the input prompt against PlainID policies and passes it through if the categories are allowed.

For setting up the categorizer, classifier providers, and the PlainID `Prompt_Control` ruleset, see the **Category Filtering** section in the core-plainid README.

```python
from core_plainid.categorization.categorizer import Categorizer
from core_plainid.utils.plainid_permissions_provider import PlainIDPermissionsProvider
from langchain_plainid.categorization.categorization_runnable import CategorizationRunnable

permissions_provider = PlainIDPermissionsProvider(
    base_url="https://platform-product.us1.plainid.io",
    client_id="your_client_id",
    client_secret="your_client_secret",
)

categorizer = Categorizer(
    classifier_provider=classifier,
    permissions_provider=permissions_provider,
    all_categories=["contract", "HR", "finance"],
)

categorization_runnable = CategorizationRunnable(categorizer=categorizer)

result = await categorization_runnable.ainvoke(
    "I'd like to know the weather forecast for today",
    config=config,
)
```

## Anonymization Runnable

The `AnonymizationRunnable` wraps the core-plainid `PresidioAnonymizer` as a LangChain `Runnable[str, str]`. It detects and anonymizes PII in the input text based on PlainID policies.

For setting up the anonymizer, encryption key, AHDS, and the PlainID `Output_Control` ruleset, see the **Anonymization** section in the core-plainid README.

```python
from core_plainid.anonymization.presidio_anonymizer import PresidioAnonymizer
from core_plainid.utils.plainid_permissions_provider import PlainIDPermissionsProvider
from langchain_plainid.anonymization.anonymization_runnable import AnonymizationRunnable

permissions_provider = PlainIDPermissionsProvider(
    base_url="https://platform-product.us1.plainid.io",
    client_id="your_client_id",
    client_secret="your_client_secret",
)

anonymizer = PresidioAnonymizer(
    permissions_provider=permissions_provider,
    encrypt_key="your_16_char_key!",
)

anonymization_runnable = AnonymizationRunnable(anonymizer=anonymizer)

result = await anonymization_runnable.ainvoke(
    "John Smith lives in New York",
    config=config,
)
print(result)  # "*** lives in ***"
```

## Retrieval

The retrieval system enforces PlainID authorization policies on document retrieval from vector stores. It supports **multiple vector stores** simultaneously, where each PlainID resource type maps to a single vector store collection (e.g. a ChromaDB collection or a FAISS index).

### PlainID Setup

Configure rulesets in PlainID using a custom template name (one resource type per vector store collection). For example, if you have a `customer` collection with `country` and `age` metadata:

```
# METADATA
# custom:
#   plainid:
#     kind: Ruleset
#     name: rs1
ruleset(asset, identity, requestParams, action) if {
    asset.template == "customer"
    asset["country"] == "Sweden"
    asset["country"] != "Russia"
    asset["age"] >= 5
}

# METADATA
# custom:
#   plainid:
#     kind: Ruleset
#     name: rs2
ruleset(asset, identity, requestParams, action) if {
    asset.template == "customer"
    asset["country"] == "Norway"
    asset["age"] <= 100
}
```

Note that you need to add `country` and `age` parameters to your vector store as document metadata. PlainID uses these metadata fields to build the filters applied during retrieval.

### Usage

```python
from langchain_chroma import Chroma
from langchain_core.documents import Document
from core_plainid.models.context.request_context import RequestContext
from langchain_plainid.retrieval.filter_directive_provider import FilterDirectiveProvider
from langchain_plainid.retrieval.multi_store_retriever import MultiStoreRetriever
from langchain_plainid.retrieval.retrieval_runnable import RetrievalRunnable

filter_provider = FilterDirectiveProvider(
    base_url="https://platform-product.us1.plainid.io",
    client_id="your_client_id",
    client_secret="your_client_secret",
)

customer_docs = [
    Document(page_content="Stockholm is the capital of Sweden.", metadata={"country": "Sweden", "age": 5}),
    Document(page_content="Oslo is the capital of Norway.", metadata={"country": "Norway", "age": 5}),
    Document(page_content="Helsinki is the capital of Finland.", metadata={"country": "Finland", "age": 5}),
]

product_docs = [
    Document(page_content="Widget A is available in Europe.", metadata={"region": "Europe", "price": 10}),
    Document(page_content="Widget B is available in Asia.", metadata={"region": "Asia", "price": 20}),
]

customer_store = Chroma.from_documents(customer_docs, embeddings, collection_name="customers")
product_store = Chroma.from_documents(product_docs, embeddings, collection_name="products")

retriever = MultiStoreRetriever(
    filter_provider=filter_provider,
    resource_types=["customer", "product"],
    vector_stores=[customer_store, product_store],
    k=4,
)

request_context = RequestContext(
    entity_id="your_entity_id",
    entity_type_id="your_entity_type",
)

docs = await retriever.aretrieve("What is the capital of Sweden?", request_context=request_context)
```

### Parameters

| Parameter | Type | Required | Description |
|---|---|---|---|
| `filter_provider` | `FilterDirectiveProvider` | Yes | Connects to PlainID and converts policies into LangChain filters |
| `resource_types` | `list[str]` | Yes | PlainID resource types — one per vector store collection/index |
| `vector_stores` | `list[VectorStore]` | Yes | LangChain vector stores, aligned by index with `resource_types` |
| `k` | `int` | No | Global max documents per store (default `4`) |
| `k_values` | `list[int]` | No | Per-store document limits, overrides `k` when provided |

The `resource_types` and `vector_stores` lists must be the same length — each resource type at index `i` maps to the vector store at the same index.

### Retrieval Runnable

The `RetrievalRunnable` wraps the `MultiStoreRetriever` as a LangChain `Runnable[str, list[Document]]`, allowing it to be used in LangChain chains:

```python
retrieval_runnable = RetrievalRunnable(retriever=retriever)

config = {"configurable": {"request_context": request_context}}

docs = await retrieval_runnable.ainvoke("What is the capital of Sweden?", config=config)
```

### FilterDirectiveProvider

The `FilterDirectiveProvider` connects to PlainID and converts resolution data into LangChain `FilterDirective` objects that the retriever uses to filter documents. It is passed as a constructor dependency to `MultiStoreRetriever`.

### DefaultQueryTranslator

When a vector store does not have a built-in LangChain query translator, the `DefaultQueryTranslator` is used as a fallback. It translates `StructuredQuery` objects into generic filter dictionaries. In most cases this is handled automatically and does not require direct usage.

## Chaining Runnables

One of the key benefits of wrapping PlainID components as LangChain Runnables is the ability to chain them using the `|` (pipe) operator. The `request_context` is passed once via `config` and flows through all runnables in the chain:

```python
chain = categorization_runnable | anonymization_runnable | retrieval_runnable

config = {"configurable": {"request_context": request_context}}

docs = await chain.ainvoke("What is John Smith's contract status?", config=config)
```

This chain will:

1. **Categorize** the prompt — verify it matches allowed categories in PlainID
2. **Anonymize** the prompt — detect and mask/encrypt PII before retrieval
3. **Retrieve** documents — query vector stores with PlainID-enforced filters

If any step fails authorization, a `PlainIDCategorizerException`, `PlainIDAnonymizerException`, or `PlainIDRetrieverException` is raised and the chain stops.

Sync usage:

```python
docs = chain.invoke("What is John Smith's contract status?", config=config)
```

## Supported Vector Stores

Different vector stores have varying levels of filter operator support. Below are the tested vector stores and their limitations:

### Chroma

Does not support: `IN`, `NOT_IN`, `STARTSWITH`, `ENDSWITH`, `CONTAINS` operators.

### FAISS

Does not support: `STARTSWITH`, `ENDSWITH`, `CONTAINS` operators.

## Exceptions

All exceptions are defined in the `core-plainid` library. See the **Exceptions** section in the core-plainid README for the full list.
