# ContextRouter: Agent Development Guide

ContextRouter is a modular, LangGraph-powered "shared brain" designed for high-performance agentic workflows and multi-source knowledge orchestration.

## Core Philosophy

- **Knowledge Agnostic**: While RAG is a core capability, the brain is designed to orchestrate data from any source (Web, SQL, Vector, RSS) through bidirectional pipelines (Read/Write).
- **Platform Agnostic**: The brain should not know about HTTP, Telegram, or Web. It consumes messages and emits events.
- **Strict Separation**: Logic (cortex) vs. Infrastructure (modules/providers) vs. Raw Data (modules/connectors).
- **Bisquit Protocol**: All data passing through the pipeline is wrapped in `BisquitEnvelope` for provenance and security.
- **Registry-First**: Components are registered via decorators and loaded lazily.

## Architecture Guidelines

### Cortex (The Brain)
- Located in `src/contextrouter/cortex/`.
- Owns the `StateGraph` definition and node orchestration.
- **Node vs Step**: Nodes (classes in `nodes/`) are wrappers for registries; Steps (functions in `steps/`) contain pure business logic.

### Agent wrapper contract (agent-mode)
- **Return type**: `BaseAgent.process(...)` MUST return a `dict` (partial state update).
- **Async steps**: If a wrapper calls an async step, it MUST `await` it. Returning a coroutine will crash LangGraph with `InvalidUpdateError`.
- **Guardrail**: Keep `tests/unit/test_agent_wrapper_contract.py` passing.

### Modules (The Capabilities)
- **Providers**: Database/Storage implementations (`IRead`, `IWrite`).
- **Connectors**: Raw data fetchers (Web, RSS, Files).
- **Models**: LLM and Embedding abstractions.
- **Protocols**: Mapping internal events to external formats (e.g., AG-UI).

## Engineering Principles

1. **No direct `os.environ`**: Use `core.config.Config` for all settings.
2. **Type Safety (Compromise Policy)**:
   - Use **TypedDict** for JSON-shaped contracts (ingestion `struct_data`, UI citation dict schemas).
   - Use **Pydantic** for runtime entities inside the brain (`cortex/models.py`: `RetrievedDoc`, `Citation`, etc.).
   - Avoid leaking `Any` across boundaries. If an external SDK returns loose objects, normalize at the boundary.
3. **Immutability**: Treat the LangGraph state as immutable; nodes return partial updates.
4. **Provenance**: Always use `envelope.add_trace("stage_name")` when transforming data.

## StructData (Ingestion + Retrieval Contracts)

### What `StructData` is
`StructData` is the canonical type for **JSON-serializable payloads** that are persisted or exported:
- In ingestion, `ShadowRecord.struct_data` (snake_case keys) and Vertex import JSONL `structData`.
- In retrieval, `RetrievedDoc.metadata` carries normalized, JSON-safe metadata from providers.

Types live in `src/contextrouter/core/types.py`:
- `StructDataPrimitive`, `StructDataValue`, `StructData`
- `coerce_struct_data(...)` for boundary normalization

### Rules
- **Use `source_type`** (snake_case) in `struct_data`. Do not introduce ad-hoc `"type"` keys.
- **Use TypedDict schemas** for per-source `struct_data` payloads:
  - `BookStructData`, `VideoStructData`, `QAStructData`, `WebStructData`, `KnowledgeStructData`
- **Boundary coercion only**: `coerce_struct_data(...)` must be called at integration boundaries (e.g., Vertex SDK parsing),
  not sprinkled through business logic.

## UI Citation Schema
The UI formatter returns camelCase dicts. Use the typed union `UICitation` (TypedDict union) from `core/types.py`
as the return type for the formatter in `modules/retrieval/formatting/`.

## Key Directories

- `core/`: Framework kernel (Config, Registry, Bisquit).
- `cortex/`: Graph orchestration and state nodes.
- `modules/retrieval/`: Search business logic (Orchestrator, Reranker, Formatter).
- `modules/protocols/agui/`: Unified UI event mapping.

## Coding Standards
- **Python**: PEP 8, 4-space indent, strict typing (mypy).
- **No Direct Imports**: `packages/contextrouter` must NOT import from external apps. It must remain a standalone kernel.
- **Documentation**: Keep `AGENTS.md` files laconic and directive.

## Agent Behavior Rules

### ⚠️ **CRITICAL: NEVER make automatic commits without explicit user permission**
- **NEVER** execute `git commit` commands on behalf of the user
- **ONLY** suggest changes and provide commands for the user to run manually
- **ALWAYS** ask for confirmation before making any commits
- **RESPECT** user's codebase ownership - commits are their decision, not yours

### Git Workflow
- **STAGE** files with `git add` only when requested
- **SUGGEST** commit messages but let user execute the commit
- **PROVIDE** commands but don't execute them automatically
- **WAIT** for explicit user approval before any git operations

### Documentation Updates
- **ALWAYS** update documentation in `contextrouter-docs/` when adding, changing, or removing functionality
- **KEEP** technical docs and user guides in sync with code changes
- **REVIEW** README files and API documentation for accuracy after code modifications
- **UPDATE** example scripts and tutorials when breaking changes occur
