Metadata-Version: 2.4
Name: python-tapestry
Version: 0.1.0
Summary: Import your full ChatGPT, Claude, and Claude Code history — give any AI instant access to everything you've ever discussed
Author: George Butiri
License: Proprietary
Keywords: ai,memory,conversations,summarization,llm
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0
Requires-Dist: requests>=2.28
Requires-Dist: python-dotenv>=1.0
Requires-Dist: tqdm>=4.60
Requires-Dist: tomli>=2.0; python_version < "3.11"
Provides-Extra: search
Requires-Dist: scikit-learn>=1.0; extra == "search"
Provides-Extra: web
Requires-Dist: flask>=3.0; extra == "web"
Requires-Dist: markdown>=3.5; extra == "web"
Provides-Extra: test
Requires-Dist: pytest>=7.0; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Dynamic: license-file

# total-recall

**Temporal Archive with Persistent Entity and Semantic Tracking for Recall Yield**

A temporal semantic compression engine for AI conversation history. Not memory in the traditional sense — pattern continuity with measured retention. 12.6% of discrete facts survive 82:1 compression at optimal settings. That's not a flaw, it's a quantified tradeoff.

Your AI has no memory. You've been talking to it for years. total-recall fixes that — import your full ChatGPT, Claude, or Claude Code history and give any AI instant access to everything you've ever discussed.

> *"There are many parts of my youth that I'm not proud of. There were loose threads — untidy parts of me that I would like to remove. But when I pulled on one of those threads, it unraveled the tapestry of my life."*
> — Jean-Luc Picard, TNG 6x15 "Tapestry"

Every conversation matters. The throwaway question at 2 AM. The half-finished debugging session. The idea you explored and abandoned. Pull any thread from your history and you lose context you didn't know you needed. Tapestry keeps every thread — then weaves them into something you can carry forward.

## What It Does

Tapestry imports your conversation history from **Claude Code**, **ChatGPT**, and **Claude** (claude.ai), then builds a compressed memory hierarchy that any AI can load as context. Conversations are summarized into dailies, dailies into weeklies, weeklies into monthlies, monthlies into yearlies — keeping token budgets manageable even across years of history.

It also includes a **visual calendar UI** where you can browse your entire conversation timeline, drill into any day, and see what you and your AI talked about.

## Install

```bash
pip install python-tapestry
```

For the calendar UI:

```bash
pip install python-tapestry[web]
```

For semantic search (cosine similarity):

```bash
pip install python-tapestry[search]
```

Or from source:

```bash
git clone <repo>
cd tapestry
pip install -e ".[web,search]"
```

## Quick Start

```bash
# 1. Initialize a new project
tapestry init

# 2. Edit tapestry.toml with your LLM provider settings

# 3. Import your conversations
tapestry ingest /path/to/conversations/

# 4. Build the memory hierarchy
tapestry iterate --agent default --save

# 5. Check your memory
tapestry status

# 6. Browse visually
tapestry calendar
```

That's it. Your AI conversations are now organized, searchable, and ready to inject as context into any prompt.

---

## Importing Conversations

Tapestry reads conversation exports from three AI platforms. Each platform exports in a different format — Tapestry normalizes them all into a common database structure.

### Claude Code / VSCode Extension

**What it reads:** A folder of `.jsonl` files (one file per conversation), stored locally by the Claude Code CLI or VSCode extension.

**Where to find them:** Your conversations live at:
- **Windows:** `C:\Users\<you>\.claude\projects\`
- **macOS/Linux:** `~/.claude/projects/`

Each subfolder corresponds to a project directory. Inside are `.jsonl` files — one per conversation session.

**How to import:**

```bash
tapestry ingest "C:/Users/you/.claude/projects/my-project" --agent my-agent
```

Or point it at the entire `projects/` folder to import everything:

```bash
tapestry ingest "C:/Users/you/.claude/projects/" --agent claude-code
```

The parser auto-detects `.jsonl` files and extracts user/assistant messages, timestamps, and session IDs.

### ChatGPT

**What it reads:** The `conversations.json` file from OpenAI's data export.

**Where to find it:** Go to [chat.openai.com](https://chat.openai.com) → Settings → Data controls → Export data. OpenAI emails you a ZIP file. Inside is `conversations.json` — a single file containing your entire chat history.

**How to import:**

```bash
tapestry ingest data/conversations.json --parser chatgpt --agent chatgpt
```

Or drag-drop the ZIP file into the calendar UI (it auto-extracts `conversations.json`).

Typical exports are 100–300 MB with thousands of conversations. The parser walks ChatGPT's tree-structured message format and flattens it into a linear thread.

### Claude (claude.ai)

**What it reads:** The `conversations.json` file from Anthropic's data export.

**Where to find it:** Go to [claude.ai](https://claude.ai) → Settings → Export Data. Anthropic sends you a ZIP containing `conversations.json`, `users.json`, `projects.json`, and `memories.json`. Tapestry only reads `conversations.json` — the rest are ignored.

**How to import:**

```bash
tapestry ingest data/conversations.json --parser anthropic --agent claude
```

Or drag-drop the ZIP directly into the calendar UI. Tapestry finds `conversations.json` inside the archive automatically.

The parser reads the flat `chat_messages` array, maps `human`→`user` and `assistant`→`assistant`, skips thinking blocks, and preserves timestamps and UUIDs.

### Import Notes

- **Deduplication is automatic.** If you import the same file twice, already-imported conversations are skipped. No duplicates.
- **Agent names are strings.** Use whatever name makes sense to you — `chatgpt`, `claude`, `work-claude`, `personal`, etc. Agents are just labels for grouping conversations.
- **Auto-detection works.** If you omit `--parser`, Tapestry examines the file and picks the right parser. You can also set a default parser per agent in `tapestry.toml`.

---

## How the Memory Hierarchy Works

Once conversations are imported, Tapestry builds a telescoping summary hierarchy:

```
                    ┌──────────┐
                    │  Yearly  │  ~10K tokens — entire year in one summary
                    │   (L4)   │
                    └────┬─────┘
                         │
                ┌────────┼────────┐
                │        │        │
           ┌────┴───┐ ┌──┴───┐ ┌──┴───┐
           │Monthly │ │Monthly│ │Monthly│  ~6K tokens each
           │  (L3)  │ │  (L3) │ │  (L3) │
           └────┬───┘ └──┬───┘ └──┬───┘
                │        │        │
          ┌─────┼─────┐  │  ...
          │     │     │
       ┌──┴──┐┌┴───┐┌┴───┐
       │Weekly││Week││Week│  ~5K tokens each
       │ (L2) ││(L2)││(L2)│
       └──┬──┘└┬───┘└┬───┘
          │    │     │
       ┌──┼──┐ │  ...
       │  │  │
    ┌──┴┐┌┴─┐┌┴─┐
    │Day││Day││Day│  ~3K tokens each
    │(L1)││(L1)││(L1)│
    └──┬┘└┬─┘└┬─┘
       │  │   │
    ┌──┼──┼───┼──┐
    │  │  │   │  │
   L0 L0 L0 L0 L0     ~1.5K tokens each (one per conversation)
```

**The key idea:** When loading context for today, Tapestry doesn't dump your entire history. It loads:

- **Distant months** as monthly summaries (~200 tokens each)
- **This month's prior weeks** as weekly summaries (~500 tokens each)
- **This week's prior days** as daily summaries (~500 tokens each)
- **Today** as individual conversation summaries

This means even years of conversation history compresses into a few thousand tokens of context.

### Building the Hierarchy

The `iterate` command walks your timeline day by day, loading prior context before generating each summary. This means every summary knows what came before it.

```bash
# Build everything from the beginning through today
tapestry iterate --agent chatgpt --save
```

This is the production-quality build. It takes time (each summary is an LLM call) but produces coherent, cross-referenced output where weeklies reference dailies and monthlies track week-over-week progression.

**The Butiri Effect:** Summaries built without prior context (using `catchup` instead of `iterate`) produce degraded output that compounds at every level. A context-free daily loses cross-conversation awareness. A weekly built from degraded dailies reads as list concatenation instead of narrative. Every node affects every future node. Always use `iterate` for data you intend to keep.

**Recall Yield (RY Factor):** Measures how much semantic fidelity is retained when summaries are generated with full telescoping context vs in isolation. Scored 0.0-1.0 via cosine similarity between context-aware and context-free summaries. **RY 1.0 = Total Recall** (perfect fidelity, no information loss). Lower scores = Partial Recall (fidelity degrades at each hierarchy level as the Butiri Effect compounds). The Butiri Effect names the phenomenon; Recall Yield quantifies it.

### Chunked Summarization

When a conversation exceeds the LLM context window, Tapestry automatically splits it into chunks, summarizes each chunk with prior context carried forward, then merges the results hierarchically. This handles conversations of any size.

**Default: 20K tokens per chunk** (optimal for 100b+ models). Override in `tapestry.toml`:

```toml
[llm]
chunk_size_tokens = 5000    # use 5000 for smaller models (8b-70b)
```

**Benchmark results** (89 runs, 4 models, 6 chunk sizes, tested on a 231K token / 928 message conversation):

| Model | Chunk Size | Runs | Ground Truth Retention |
|-------|-----------|------|----------------------|
| gpt-oss-120b | 20K | 5 | **12.6%** |
| gpt-oss-120b | 10K | 6 | 10.2% |
| gpt-oss-120b | 5K | 16 | 8.9% |
| llama-3.1-8b | 5K | 5 | 11.6% |
| llama-3.3-70b | 5K | 6 | 8.4% |

Retention measures the percentage of 546 ground truth topics (extracted from raw messages) that survive in the compressed summary, scored via SBERT cosine similarity. The 120b model shows a clear bell curve peaking at 20K — large enough for cross-topic context, small enough to maintain attention.

---

## The Database

Tapestry stores everything in a single SQLite file: `db/tapestry.db` (created on `tapestry init`).

### Tables

**`tapestry`** — The summary hierarchy. Every summary at every level lives here.

| Column | Type | Purpose |
|--------|------|---------|
| `tap_id` | INTEGER | Primary key |
| `tap_agent` | TEXT | Which agent this belongs to (e.g., "chatgpt") |
| `tap_level` | INTEGER | 0=conversation, 1=daily, 2=weekly, 3=monthly, 4=yearly |
| `tap_title` | TEXT | LLM-generated title |
| `tap_content` | TEXT | LLM-generated summary |
| `tap_date` | TEXT | Date this summary covers (for L0/L1) |
| `tap_date_start` | TEXT | Range start (for L2–L4) |
| `tap_date_end` | TEXT | Range end (for L2–L4) |
| `tap_source` | TEXT | Comma-separated child `tap_id`s used to generate this summary |
| `tap_parent_id` | INTEGER | FK to parent summary (daily→weekly→monthly→yearly) |

**`tapestry_messages`** — Raw conversation messages. Imported from your exports and kept for search and re-generation.

| Column | Type | Purpose |
|--------|------|---------|
| `tmsg_id` | INTEGER | Primary key |
| `tmsg_agent` | TEXT | Agent name |
| `tmsg_ref_uuid` | TEXT | Original message UUID from the source platform |
| `tmsg_ref_conv_id` | TEXT | Conversation ID from the source platform |
| `tmsg_role` | TEXT | `user` or `assistant` |
| `tmsg_content` | TEXT | Message text |
| `tmsg_order` | INTEGER | Position within the conversation |
| `tmsg_timestamp` | TEXT | ISO 8601 timestamp |
| `tmsg_tap_id` | INTEGER | FK to the L0 summary for this conversation |

**`tapestry_meta`** — Key-value store for schema version, agent colors, and other settings.

### How Data Flows

```
Export files  →  ingest  →  tapestry_messages (raw messages)
                              ↓
                           iterate  →  tapestry (L0 summaries)
                                          ↓
                                       tapestry (L1 dailies)
                                          ↓
                                       tapestry (L2 weeklies)
                                          ↓
                                       tapestry (L3 monthlies)
                                          ↓
                                       tapestry (L4 yearlies)
```

Each level is generated by an LLM call that reads the children below it, plus prior context at the same level (telescoping). The result is a tree where every node links to its parent and references its source children.

---

## Calendar UI

Tapestry includes a web-based calendar for browsing your conversation history visually.

```bash
tapestry calendar
```

This opens a Flask server on `http://127.0.0.1:5008` with:

- **Monthly grid** — Each day shows colored pills for daily summaries. Colors correspond to agents.
- **Weekly headers** — Weeks with weekly summaries show the title inline.
- **Day detail panel** — Click any day to see its daily summaries and the individual conversations that fed into them.
- **Full record view** — Click any summary title to read the full content in a modal.
- **Agent sidebar** — Filter the calendar by agent. Each agent gets a color (customizable).
- **Upload** — Drag-and-drop conversation exports (JSON or ZIP) directly into the UI. Select agent and parser, and the import runs in-browser.
- **Agent management** — Pick colors for each agent from the sidebar gear icon.

The calendar runs in debug mode by default, so it auto-reloads when you change files. Use `--no-debug` for production. Add `--browser` to auto-open your browser on launch.

```bash
tapestry calendar --port 5008 --browser
```

---

## Semantic Search

Keyword search finds exact matches. But when you're looking for *"that conversation about patent strategy before publishing"* and the word "patent" appears in 44 different summaries — keyword search drowns you in noise. Semantic search ranks by meaning.

```bash
# Find conversations by concept, not exact words
tapestry similar "patent filing strategy before publishing" --agent chatgpt

# Filter to specific levels
tapestry similar "ESP32 deep sleep battery" --level 0 --top 10

# JSON output for piping
tapestry similar "database migration" -f json
```

Under the hood, Tapestry builds a [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) matrix over all summary content for the agent, vectorizes your query against the same vocabulary, and ranks results by [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity). Zero-relevance results are filtered out. The index is built in-memory at query time — no separate indexing step, no external service, no API calls.

**Install:** `pip install python-tapestry[search]` (adds scikit-learn).

### Why Not Embeddings?

TF-IDF is the right first step:

- **Zero cost.** No API calls, no embedding model, no external service. Runs entirely on your machine.
- **Zero config.** No vector database, no index management, no model selection.
- **Fast enough.** Searches 3,300+ records in under 1 second (see benchmarks below).
- **Good enough.** For a personal conversation corpus with consistent vocabulary, TF-IDF captures topic similarity effectively. You're not searching the entire internet — you're searching *your own words*.

A future card (#410) explores API-based embeddings for cross-language and abstract concept matching. TF-IDF handles the 90% case today.

### Benchmarks

Tested against a real corpus: 3,342 ChatGPT summaries (L0-L4), 669 Claude summaries, and 198 Claude Code summaries.

| Query | Agent | Records | Time | Top Result | Score |
|-------|-------|---------|------|------------|-------|
| "patent filing strategy open source package" | chatgpt | 3,342 | 6.7s | Patenting Electronic Circuits and Designs | 0.4211 |
| "cosine similarity drift detection" | john-vscode | 198 | 0.03s | Implemented Tapestry class and session wrap-up | 0.2062 |
| "database migration schema upgrade" | chatgpt | 3,342 | 0.6s | Concise Code Setup Guide | 0.1388 |
| "ESP32 deep sleep battery optimization" | chatgpt | 3,342 | 0.6s | Arduino Project Ideas Exploration | 0.2343 |
| "React component refactoring hooks" | chatgpt | 3,342 | 0.6s | Explaining React as a JavaScript library | 0.3506 |
| "tapestry hierarchical memory summarization" | john-vscode | 198 | 0.03s | Designing Tapestry AI Package and CLI | 0.2295 |

**Keyword search comparison** for the same queries:

| Keyword | Agent | Keyword Hits | Cosine Top 5 |
|---------|-------|-------------|--------------|
| "patent" | chatgpt | 44 results (unsorted) | 5 results (ranked by relevance) |
| "database" | chatgpt | 297 results (unsorted) | 5 results (ranked) |
| "ESP32" | chatgpt | 157 results (unsorted) | 5 results (ranked) |
| "React" | chatgpt | 162 results (unsorted) | 5 results (ranked) |

Keyword search returns *every* record containing the word. Cosine search returns the *most relevant* records, sorted by how closely they match the full query phrase.

Note: The first query against a large agent (3,342 records) takes ~6.7s to build the TF-IDF matrix. Subsequent queries against smaller agents are near-instant (0.03s for 198 records). The matrix is rebuilt each query — a future optimization could cache it per session.

---

## Python API

Everything is accessible through the `Tapestry` class:

```python
from tapestry import Tapestry

# Initialize — reads tapestry.toml, connects to DB
t = Tapestry()

# Import conversations
result = t.ingest("path/to/conversations/", agent="chatgpt", parser_name="chatgpt")
print(f"Imported {result['conversations']} conversations, {result['messages']} messages")

# Build the hierarchy (production quality, with full context)
counts = t.iterate(end_date="2026-03-04", agent="chatgpt", save=True)
print(f"Generated {counts['l0']} L0s, {counts['l1']} dailies, {counts['l2']} weeklies")

# Get compressed context for today (for prompt injection)
context = t.get_context(agent="chatgpt")
for item in context:
    print(f"[L{item['level']}] {item['title']}")

# Search your history (keyword)
results = t.search("database migration", agent="chatgpt")
for r in results:
    print(f"{r['date']} — {r['title']} ({r['match_count']} matches)")

# Semantic search (cosine similarity — requires [search] dep)
results = t.similar("patent filing strategy before publishing", agent="chatgpt", top_n=5)
for r in results:
    print(f"{r['score']:.4f} — {r['title']}")

# Recall what happened on a specific date
entries = t.recall("2026-02-15", agent="chatgpt")

# Get system health
status = t.status(agent="chatgpt")
print(f"Health: {status['health']}")

# Clean up
t.close()
```

Or use as a context manager:

```python
with Tapestry() as t:
    results = t.search("bug fix", agent="claude-code")
```

### API Reference

| Method | Returns | Description |
|--------|---------|-------------|
| `ingest(path, agent, parser_name)` | `dict` | Import conversations. Returns `{conversations, skipped, messages}`. |
| `iterate(end_date, agent, save)` | `dict` | Build hierarchy with full context. Returns `{l0, l1, l2, l3}` counts. |
| `catchup(agent, delay)` | `dict` | Bulk build without context (testing only). |
| `get_context(agent, anchor_date)` | `list[dict]` | Telescoping context for a date. Ready for prompt injection. |
| `search(query, agent, level)` | `list[dict]` | Search raw messages by keyword. |
| `search_summaries(query, agent, level)` | `list[dict]` | Search summary content by keyword. |
| `similar(query, agent, level, top_n)` | `list[dict]` | Semantic search via TF-IDF cosine similarity. Requires `[search]` dep. |
| `recall(date_str, agent, end_date)` | `list[dict]` | Best available summary for a date or range. |
| `show(tap_id)` | `dict` | Get a single summary record by ID. |
| `conversations(agent)` | `list[dict]` | List all imported conversations with message counts. |
| `tree(agent)` | `dict` | Full hierarchy tree (monthlies → weeklies → dailies). |
| `stats(agent)` | `dict` | Record counts and size statistics by level. |
| `status(agent)` | `dict` | Health dashboard — stats, date range, missing nodes, health grade. |

---

## Configuration

### tapestry.toml

```toml
[tapestry]
db_path = "db/tapestry.db"

[llm]
provider = "groq"
api_url = "https://api.groq.com/openai/v1/chat/completions"
content_model = "openai/gpt-oss-120b"
title_model = "openai/gpt-oss-120b"
temperature = 0.3
chunk_size_tokens = 20000   # 20K for 100b+ models, 5000 for smaller models

[defaults]
first_day_of_week = "monday"
exclude_today = true

[agents.default]
parser = "claude-code"

[agents.chatgpt]
parser = "chatgpt"

[agents.claude]
parser = "anthropic"
```

### .env

```
TAPESTRY_API_KEY=your_api_key_here
```

The API key is for your LLM provider (Groq, OpenAI, or any OpenAI-compatible API). Tapestry uses this to generate summaries — it does not send your data anywhere else.

### Agent Configuration

Agents are string labels that group conversations by source. Define them in `tapestry.toml` under `[agents.<name>]` with a default parser. You can have as many agents as you want:

```toml
[agents.work-claude]
parser = "claude-code"

[agents.personal-chatgpt]
parser = "chatgpt"

[agents.claude-ai]
parser = "anthropic"
```

---

## CLI Reference

| Command | Description |
|---------|-------------|
| `tapestry init` | Initialize project (creates DB and default config) |
| `tapestry ingest <path>` | Import conversation files |
| `tapestry iterate [date]` | Build hierarchy with full telescoping context |
| `tapestry catchup` | Bulk build without context (testing/initial ingest only) |
| `tapestry calendar` | Launch the visual calendar UI |
| `tapestry status` | System health dashboard |
| `tapestry stats` | Record counts per level |
| `tapestry context <agent>` | Print telescoping context for today |
| `tapestry tree <agent>` | Print full hierarchy tree |
| `tapestry search <query>` | Search messages and summaries |
| `tapestry similar <query>` | Semantic search via cosine similarity (requires `[search]` dep) |
| `tapestry recall <date>` | Best available summary for a date |
| `tapestry show <id>` | Display a single summary record |
| `tapestry conversations` | List all imported conversations |
| `tapestry generate-l0 <conv_id>` | Generate a single conversation summary |
| `tapestry generate-daily <date>` | Generate a single daily summary |
| `tapestry generate-weekly <start> <end>` | Generate a single weekly summary |
| `tapestry generate-monthly <year> <month>` | Generate a single monthly summary |
| `tapestry generate-yearly <year>` | Generate a single yearly summary |

---

## Validation

Tested against a real ChatGPT corpus: 1,931 conversations spanning September 2023 through June 2025 (22 months), rebuilt with `iterate` using full telescoping context.

### Corpus Statistics

| Level | Count | Avg Length |
|-------|-------|------------|
| L0 (conversation) | 1,931 | 1,624 chars |
| L1 (daily) | 558 | 3,401 chars |
| L2 (weekly) | 110 | 5,034 chars |
| L3 (monthly) | 21 | 6,025 chars |
| L4 (yearly) | 2 | 10,019 chars |

### Structural Integrity

| Check | Result |
|-------|--------|
| Parent links valid | PASS — 0 dangling references |
| Orphan nodes | PASS — 0 orphan summaries |
| Source references | PASS — all child references resolve |
| Date coverage | PASS — every date with L0s has a corresponding L1 |
| Duplicate detection | PASS — 0 duplicates at any level |
| Content completeness | PASS — 0 empty titles or content |

---

## Requirements

- Python 3.10+
- An OpenAI-compatible LLM API (Groq, OpenAI, local, etc.)
- Flask 3.0+ (only for the calendar UI — installed with `pip install python-tapestry[web]`)
- scikit-learn 1.0+ (only for semantic search — installed with `pip install python-tapestry[search]`)

## License

[Proprietary](LICENSE) — Source-available, not open source.

**Free for:** personal use, development, testing, research, academic work, non-commercial projects. Study it, fork it, learn from it.

**Requires a commercial license for:** hosted services, revenue-generating products, organizational/business use. Contact george@iseestudios.com.

Patent-protected. See LICENSE for full terms.
