High-level counters for the current proxy conversation. These update in real time as requests flow through the proxy.
Visualizes how much of the conversation history has been compacted. The bar fills as the compaction watermark advances through the message history. Green means healthy headroom; yellow and red indicate the engine is compacting aggressively to stay within the context window.
Average latency breakdown for each stage of the request processing pipeline, computed across all requests in this conversation.
Compares total input tokens sent by virtual-context against a naive baseline that sends full history each turn, compacting at 30% ratio when it hits the context window. The baseline gets full credit for its own compaction — savings reflect the combined benefit of VC's filtering, selective retrieval, and enrichment over what a standard system would do.
| Model | $/MTok | Baseline | VC | Saved | % |
|---|---|---|---|---|---|
| awaiting first request | |||||
Breakdown of LLM API calls by internal component. Shows token usage, estimated cost (via ModelCatalog pricing), and wall-clock time for each pipeline stage. Cost is calculated from input/output token counts at the model's list price.
Each proxy run creates a conversation. When compaction fires, it stores segments tagged with the conversation ID. This panel shows all conversations that have stored segments, along with their compression stats and tag coverage. The current conversation is highlighted in green.
Tags currently in the conversation's working set, based on the most recent turns (controlled by active_tag_lookback). Active tags represent the topics being discussed right now. During retrieval, these tags are skipped because their content is already present in the raw conversation history. The store tag count shows how many distinct tags exist across all stored segments.
Chronological log of every LLM request processed by the proxy (newest first, max 200 rows). Each row shows one conversation turn from the moment the user message arrives through context injection.
show all
| T# | Session | Inbound Tags | Response Tags | Message | Payload | Sent | Base | Injected | VC | LLM | Total |
|---|
History of compaction events (newest first). Each entry represents one compaction operation where the engine summarized older conversation turns to free context window space.
History of VC tool interceptions (newest first). When paging is enabled and the LLM calls vc_expand_topic or vc_collapse_topic, the proxy intercepts the tool call, executes it locally against the engine, and sends a continuation request to the upstream LLM. The client never sees these tool calls.