Metadata-Version: 2.3
Name: swarm-notes
Version: 0.1.5
Summary: Automated research paper tracking and knowledge synthesis
Author: LM
Author-email: LM <hi@leima.is>
Requires-Dist: beautifulsoup4>=4.14.3
Requires-Dist: markitdown[pdf]>=0.1.0
Requires-Dist: paperscraper>=0.3.6
Requires-Dist: pydantic>=2.12.5
Requires-Dist: pydantic-ai>=1.77.0
Requires-Dist: python-dotenv>=1.2.2
Requires-Dist: python-frontmatter>=1.1.0
Requires-Dist: pyyaml>=6.0.3
Requires-Dist: requests>=2.32.5
Requires-Dist: typer>=0.24.1
Requires-Python: >=3.11
Description-Content-Type: text/markdown

# research-cruise 🚀

An autonomous, serverless, multi-agent system that tracks academic papers, extracts structured data, and weaves them into a local, interconnected Markdown knowledge graph — a **Second Brain** for ML research.  
Built to eventually communicate with other identical systems, forming a decentralised **Hive Mind**.

---

## Architecture

```
┌────────────────────────────────────────────┐
│                  Triggers                  │
└─────────────────────┬──────────────────────┘
                      │
         ┌────────────▼────────────┐
         │   Federation Agent      │  ← consumes external public_feed.json feeds
         └────────────┬────────────┘
                      │
         ┌────────────▼────────────┐
         │       Watcher           │  ← queries ArXiv API by keyword
         └────────────┬────────────┘
                      │  RawPaper[]
         ┌────────────▼────────────┐
         │    Router (Skill        │  ← routes each paper to a domain skill
         │    Registry)            │    (NLP, Vision, TimeSeries, …)
         └────────────┬────────────┘
                      │  Skill
         ┌────────────▼────────────┐
         │    Analyst              │  ← pydantic-ai structured extraction
         │    (pydantic-ai)        │    with taxonomy injection
         └────────────┬────────────┘
                      │  PaperAnalysis
         ┌────────────▼────────────┐
         │    Vault Writer         │  ← writes .md to tmp_vault/
         │                         │    generates concept stubs
         │                         │    updates public_feed.json
         └────────────┬────────────┘
                      │  atomic move
         ┌────────────▼────────────┐
         │       /vault            │  ← permanent, file-based knowledge graph
         │   papers/ concepts/     │
         │   datasets/             │
         └─────────────────────────┘
```

## Directory Structure

```
research-cruise/
├── .github/
│   └── workflows/
│       └── autonomous-tracker.yml   # CI/CD pipeline
├── vault/
│   ├── papers/                      # One .md file per paper
│   ├── concepts/                    # Auto-generated concept stubs
│   └── datasets/                    # Dataset stubs
├── swarm_notes/
│   ├── config.py                    # Configuration & env vars
│   ├── vault_manager.py             # Staging pattern (tmp_vault → vault)
│   ├── watcher.py                   # Configurable paper-source watcher
│   ├── router.py                    # Skill registry router
│   ├── analyst.py                   # pydantic-ai extraction agent
│   ├── vault_writer.py              # Markdown writer + public_feed.json
│   ├── federation.py                # Hive Mind federation agent
│   └── main.py                      # Pipeline orchestrator
```

## Quick Start

### Prerequisites

- Python 3.11+
- An LLM API key

### Local Dev Run

```bash
# Install dependencies
uv sync

# Set your API key in .env file
export LLM_API_KEY="sk-..."
export PAPER_SOURCE="semantic_scholar"
export SEMANTIC_SCHOLAR_API_KEY="..."

# prepare configs in configs/ folder
...

# Run the pipeline
python -m swarm_notes.main
```

### Configuration (Environment Variables)

Use the example in configs folder to create your own version.

## CI/CD Setup

### Add the required secret

The pipeline needs an OpenAI-compatible API key to run the LLM analyst step.

1. Open your forked repository on GitHub.
2. Go to **Settings → Secrets and variables → Actions**.
3. Click **New repository secret**.
4. Set **Name** to `LLM_API_KEY` and **Secret** to your API key (e.g. `sk-...`).
5. Click **Add secret**.

> **Note:** The workflow exposes `LLM_API_KEY` as both `LLM_API_KEY` and `OPENAI_API_KEY`
> so that pydantic-ai's OpenAI provider picks it up automatically.


## The Hive Mind (Federation)

Every successful run updates `public_feed.json` at the root of the repository with the metadata and summaries of the last 20 processed papers.

To subscribe to another agent's feed, pass their raw `public_feed.json` URL:

```bash
export FEDERATION_FEEDS="https://raw.githubusercontent.com/alice/research-cruise/main/public_feed.json,https://raw.githubusercontent.com/bob/research-cruise/main/public_feed.json"
python -m swarm_notes.main
```

**Conflict resolution:** If an external feed contains a review of a paper that already exists locally, the local metadata is preserved.  The external summary is appended under a `### External Perspectives` section:

```markdown
### External Perspectives

> "Transformers are over-engineered for this dataset." - @Agent_alice
> *(Retrieved 2024-01-15)*
```

## Vault File Format

Each paper note uses hybrid YAML frontmatter (CSL-compatible fields + custom fields):

```yaml
---
# CSL-compatible fields
title: "Attention Is All You Need"
author:
  - literal: "Ashish Vaswani"
issued:
  date-parts:
    - [2017, 6, 12]
url: "https://arxiv.org/abs/1706.03762"

# Custom fields
arxiv_id: "1706.03762"
domain: "nlp"
tags:
  - "transformer"
  - "attention-mechanism"
architectures:
  - "encoder-decoder"
datasets:
  - "WMT 2014"
skill: "NLPSkill"
processed_at: "2024-01-15T06:00:00Z"
---
```

Body sections: **Summary**, **Key Contributions**, **Key Concepts** (with relative links to `../concepts/`), **Datasets**, **Limitations**, **Links**.

## Taxonomy

`taxonomy.json` contains the controlled vocabulary of tags, architectures, and domains injected into the analyst's system prompt.  This prevents LLM hallucination and keeps metadata consistent.  Edit `taxonomy.json` to add new terms.

## License

MIT — see [LICENSE](LICENSE).