Metadata-Version: 2.4
Name: continualcode
Version: 0.1.0
Summary: Human-in-the-loop coding agent with online learning via prompt distillation
Project-URL: Homepage, https://github.com/sdan/continualcode
Project-URL: Repository, https://github.com/sdan/continualcode
Author-email: Surya Dantuluri <surya@prava.dev>
License-Expression: MIT
License-File: LICENSE
Keywords: claude-code,coding-agent,context-distillation,fine-tuning,llm,prompt-distillation,rlhf,tui
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development
Requires-Python: >=3.10
Requires-Dist: textual>=0.40.0
Requires-Dist: tinker-cookbook>=0.1.0
Requires-Dist: torch>=2.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# ContinualCode

Human-in-the-loop coding agent with online learning via prompt distillation.

A Claude Code-style TUI that learns from your approvals. When you approve or edit a tool call, the model trains on that feedback in real-time.

## Features

- **Claude Code-style approval UI** - 4-option selection with keyboard navigation
- **Online prompt distillation** - Model learns to internalize policy without needing the policy prompt
- **Permission memory** - "Don't ask again" for trusted directories
- **Correction feedback** - Type what the model should have done (for DPO training)
- **Checkpoint save/load** - Ctrl+S to save, resume later with `--checkpoint`

## Install

```bash
pip install continualcode
```

## Quick Start

```bash
# Run the TUI
continualcode

# With a custom policy file
continualcode --policy ./my_rules.md

# Resume from checkpoint
continualcode --checkpoint ./checkpoints/step_000100

# Inference only (no training)
continualcode --no-training
```

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `MODEL` | `Qwen/Qwen3-4B-Instruct-2507` | Base model |
| `TINKER_URL` | `None` | Tinker service URL |
| `TINKER_API_KEY` | `None` | Tinker API key |
| `ENABLE_TRAINING` | `1` | Enable LoRA training |
| `LEARNING_RATE` | `1e-5` | Adam learning rate |
| `LORA_RANK` | `32` | LoRA rank |
| `POLICY_PATH` | `./policy_memory.md` | Policy file injected into teacher prefix |
| `DISTILL_MODE` | `on_policy` | `on_policy` or `off_policy` |
| `CHECKPOINT_DIR` | `./checkpoints` | Where Ctrl+S saves checkpoints |
| `LOAD_CHECKPOINT` | `""` | Checkpoint to load on startup |

## How It Works

**Prompt distillation** trains a model to behave *as if* it had access to a long policy prompt, without actually needing it at inference time:

1. **Generation** uses a "teacher" prefix that includes your policy file
2. **Training** uses a "student" prefix without the policy
3. The model learns to produce the same (approved) outputs without the policy context

Two distillation modes:
- `off_policy`: Classic SFT on approved outputs
- `on_policy`: Sample from student, score with teacher logprobs, update via importance sampling

## Keybindings

| Key | Action |
|-----|--------|
| `Enter` | Send message |
| `Ctrl+S` | Save checkpoint |
| `Ctrl+L` | Clear conversation |
| `Ctrl+C` | Quit |
| `?` | Show help |

**Approval UI:**
| Key | Action |
|-----|--------|
| `1` / `y` | Approve |
| `2` / `Shift+Tab` | Approve + remember |
| `3` / `n` / `Esc` | Deny |
| `4` / `Tab` | Type correction |
| `e` | Edit args in $EDITOR |
| Arrow keys | Navigate options |

## License

MIT
