Metadata-Version: 2.4
Name: getinvoke
Version: 0.6.2
Summary: Voice-to-prompt for developers. Hold a button, speak, release — structured prompts on your clipboard in under a second.
Author-email: Zach Roth <zach@scrn.co>
License-Expression: MIT
Project-URL: Homepage, https://getinvoke.dev
Project-URL: Repository, https://github.com/zmroth/invoke
Keywords: voice,dictation,whisper,coding,prompt,speech-to-text,developer-tools,claude,cursor,copilot
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: faster-whisper>=1.1.0
Requires-Dist: httpx>=0.28.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pynput>=1.8.0
Requires-Dist: sounddevice>=0.5.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: scipy>=1.11.0
Requires-Dist: pyperclip>=1.9.0
Requires-Dist: click>=8.0.0
Requires-Dist: rich>=13.0.0
Provides-Extra: gui
Requires-Dist: pystray>=0.19.0; extra == "gui"
Requires-Dist: Pillow>=10.0.0; extra == "gui"
Provides-Extra: cuda
Requires-Dist: nvidia-cublas-cu12; extra == "cuda"
Requires-Dist: nvidia-cuda-runtime-cu12; extra == "cuda"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-asyncio; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: pyinstaller; extra == "dev"
Dynamic: license-file

# Invoke

[![PyPI](https://img.shields.io/pypi/v/getinvoke)](https://pypi.org/project/getinvoke/)
[![Python](https://img.shields.io/pypi/pyversions/getinvoke)](https://pypi.org/project/getinvoke/)
[![License](https://img.shields.io/pypi/l/getinvoke)](https://github.com/zmroth/invoke/blob/main/LICENSE)

**Voice-to-prompt for developers.**

Hold a button, speak, release. Your words hit the clipboard (or auto-paste at your cursor) in under a second. GPU-accelerated local transcription via Whisper — no cloud round-trip for speech-to-text.

Optionally run your transcription through an AI reformatter to clean it up into structured prompts for Claude Code, Cursor, Windsurf, Copilot, or raw text.

## Install

```bash
# CLI only (headless, any platform)
pip install getinvoke

# With GUI (system tray + overlay)
pip install getinvoke[gui]

# Or use pipx for isolated install
pipx install getinvoke
```

> **Note:** The CLI command is `dictate`, not `invoke`, to avoid conflict with the pyinvoke task runner.

## Quick Start

```bash
# Check your setup (GPU, mic, settings)
dictate health

# Run the setup wizard
dictate setup

# Start dictating (headless mode)
dictate run

# Start with GUI (requires [gui] extras)
dictate run  # auto-detects GUI availability
```

**Default workflow:** Hold **Mouse5** (forward side button) → speak → release → text on clipboard.

## How It Works

```
 YOU SPEAK into your mic
     │
     ▼
 ┌─────────────────────────────┐
 │  WHISPER (runs locally)     │
 │  Model: distil-large-v3     │
 │  GPU (CUDA) or CPU          │
 │  No internet. No API key.   │
 │                              │
 │  Output: raw transcription   │
 │  "uh so like fix the um     │
 │   auth middleware thing"     │
 └──────────────┬──────────────┘
                │
                ▼
 ┌─────────────────────────────┐
 │  AI REFORMATTER (optional)  │
 │                              │
 │  Takes raw speech and       │
 │  reshapes it based on the   │
 │  selected MODE:             │
 │                              │
 │  Claude Code → conversational│
 │  Cursor → @file references  │
 │  Commit → git commit msg    │
 │  Raw → just clean up speech │
 │  none → SKIP (no AI call)   │
 │                              │
 │  Backend options:            │
 │  • openrouter (cloud API)   │
 │  • claude-cli (local CC)    │
 │  • ollama (fully local)     │
 │  • none (skip AI entirely)  │
 └──────────────┬──────────────┘
                │
                ▼
 ┌─────────────────────────────┐
 │  CLIPBOARD + AUTO-PASTE     │
 │  "Fix the auth middleware   │
 │   to validate JWT tokens    │
 │   before checking roles"    │
 └─────────────────────────────┘
```

## Platform Support

| Platform | CLI (headless) | GUI (tray + overlay) | Installer |
|---|---|---|---|
| Windows | `pip install getinvoke` | `pip install getinvoke[gui]` | `.exe` via GitHub Releases |
| Linux | `pip install getinvoke` | `pip install getinvoke[gui]` | — |
| macOS | `pip install getinvoke` | Planned | — |

## Comparison

| Feature | Invoke | whisper.cpp CLI | nerd-dictation | macOS Dictation |
|---|---|---|---|---|
| Local Whisper | Yes | Yes | Yes | No |
| AI reformatting | Yes (4 backends) | No | No | No |
| Code-aware modes | 7 modes | No | No | No |
| Project context | Auto-detected | No | No | No |
| GPU acceleration | CUDA | CUDA/Metal | No | — |
| Cross-platform | Win/Linux/Mac | Win/Linux/Mac | Linux | Mac only |
| pip installable | Yes | No | No | — |

## Requirements

- Python 3.11+
- NVIDIA GPU recommended (CUDA) — works on CPU but slower
- Windows, Linux, or macOS

## CLI Commands

| Command | Description |
|---|---|
| `dictate run` | Start Invoke (auto-detects GUI vs headless) |
| `dictate run --headless` | Force headless mode |
| `dictate run --no-beep` | Start without audio feedback |
| `dictate health` | Check GPU, model, mic, API keys, config |
| `dictate devices` | List audio input devices |
| `dictate modes` | List available target modes |
| `dictate history` | View dictation history |
| `dictate setup` | Run setup wizard |
| `dictate setup --cli` | Force CLI wizard (no GUI) |
| `dictate config` | View all configuration |
| `dictate config mode cursor` | Set a config value |
| `dictate license status` | Check license status |
| `dictate license activate <key>` | Activate a license |
| `dictate install` | Create Windows Startup shortcut |
| `dictate transcribe <file>` | One-shot WAV transcription |
| `dictate logs` | View log file |

## Configuration

### Key Settings

| Setting | Default | Options |
|---|---|---|
| `reformatter_backend` | `none` | `openrouter`, `claude-cli`, `ollama`, `none` |
| `mode` | `claude-code` | `claude-code`, `cursor`, `windsurf`, `copilot`, `raw`, `commit`, `pr` |
| `whisper_model` | `distil-large-v3` | Any [faster-whisper model](https://huggingface.co/Systran) |
| `whisper_device` | `auto` | `auto`, `cuda`, `cpu` |
| `hotkey` | `mouse5` | `mouse5`, `mouse4`, `ctrl+shift+d`, or any key combo |
| `auto_paste` | `false` | `true` = insta-dump at cursor |
| `audio_device` | system default | Device index from `dictate devices` |
| `beep_enabled` | `true` | `true` / `false` |
| `sound_preset` | `default` | `default`, `subtle`, `arcade`, `minimal`, `icq`, `aim`, `dial-up` |
| `language` | `en` | `en`, `auto`, or any Whisper language code |

Set values quickly from the terminal:
```bash
dictate config mode cursor
dictate config reformatter_backend claude-cli
dictate config auto_paste true
```

### AI Reformatter Backends

| Backend | Where it runs | What you need | Cost |
|---|---|---|---|
| `openrouter` | Cloud (OpenRouter API) | `OPENROUTER_API_KEY` | ~$0.001/dictation |
| `claude-cli` | Local (your Claude Code) | Claude Code installed | Part of your CC plan |
| `ollama` | Local (your machine) | [Ollama](https://ollama.com/) running on localhost:11434 | Free, no internet |
| `none` | Skipped entirely | Nothing | Free — raw Whisper output only |

### Target Modes

| Mode | What the AI does with your speech |
|---|---|
| `claude-code` | Formats as a conversational prompt for Claude Code |
| `cursor` | Adds `@file` references, structured for Cursor's chat |
| `windsurf` | Cascade-style prompt for Windsurf |
| `copilot` | GitHub Copilot Chat format |
| `raw` | Cleans up filler/grammar only — no prompt structure |
| `commit` | Turns your description into a git commit message |
| `pr` | Turns your description into a PR title + body |

### Custom Hotkeys

The settings GUI has a **Record...** button to capture any key combo. Or set it from the CLI:

```bash
dictate config hotkey "ctrl+shift+r"
dictate config hotkey mouse5
```

Supports modifier combos (`ctrl+shift+d`, `alt+f2`), mouse buttons (`mouse4`, `mouse5`), and single keys.

### Ollama Setup

```bash
# Install Ollama (https://ollama.com)
ollama pull llama3.2

# Configure Invoke to use it
dictate config reformatter_backend ollama
dictate config ollama_model llama3.2
# Default URL: http://localhost:11434
```

### Custom Vocabulary

Boost Whisper recognition for domain-specific terms:

```bash
dictate config custom_vocab "KeyGrip,OpenRouter,pyproject,FastAPI,pynput"
```

### History

```bash
dictate history            # Recent entries
dictate history -n 20      # Last 20
dictate history -s "fix"   # Search
```

History retention defaults to 30 days. Change with `dictate config history_retention_days 90`.

## Audio Feedback

- **Short high tick** — recording started (hotkey pressed)
- **Rising chirp** — done, text is on clipboard / pasted
- Disable with `dictate run --no-beep` or `dictate config beep_enabled false`

### Sound Presets

Choose from 7 sound themes:

| Preset | Vibe |
|---|---|
| `default` | Original Invoke sounds |
| `subtle` | Softer, lower tones |
| `arcade` | Retro game-style chirps |
| `minimal` | Single short ticks |
| `icq` | Classic ICQ vibes |
| `aim` | AIM buddy list chirps |
| `dial-up` | Modem handshake tones |

```bash
dictate config sound_preset icq
```

Preview all sounds in **Settings > General > Sound theme > Preview**.

## Linux Notes

- **Auto-paste** requires `xdotool`: `sudo apt install xdotool`
- **Mouse side buttons** (mouse4/mouse5) work via X11 — may not work on pure Wayland
- Audio stream stays open to avoid ALSA/PipeWire latency on every hotkey press
- If using `pip install` on modern Debian/Ubuntu, use `pipx install getinvoke` instead (PEP 668)

## System Tray (GUI mode)

When installed with `pip install getinvoke[gui]`, Invoke runs in the system tray with:
- State indicator (idle / recording / processing / error)
- Current AI backend + model displayed
- Mode switcher (right-click menu)
- License management
- Settings editor
- Open logs
- Quit

Use `dictate-gui` to launch without a console window.

## Building the Windows Installer

See the [installer README](installer/) for details on building the `.exe` installer with PyInstaller + Inno Setup.

## Development

```bash
pip install -e ".[dev,gui]"
ruff check src/ tests/
ruff format src/ tests/
pytest
```

## Architecture

```
src/getinvoke/
  engine.py       # Core pipeline engine (FrontendProtocol + InvokeEngine)
  app.py          # GUI frontend wrapper (GuiFrontend + InvokeApp)
  terminal.py     # Rich terminal frontend (TerminalFrontend)
  cli.py          # Click CLI commands
  setup_cli.py    # Terminal-based setup wizard
  clipboard.py    # Copy + auto-paste via pynput
  config.py       # Settings from config.json / env vars / .env
  context.py      # Project context loader (18 file types + git)
  feedback.py     # Audio cues (beeps/chirps)
  history.py      # SQLite dictation history
  hotkey.py       # Global hotkey listener (mouse/keyboard)
  license.py      # Lemon Squeezy license validation (7-day trial)
  license_gui.py  # License activation dialog (tkinter, GUI only)
  modes/          # 7 target mode system prompts
  overlay.py      # Recording/processing overlay (tkinter, GUI only)
  recorder.py     # Push-to-talk audio capture (sounddevice)
  reformatter.py  # LLM reformatting (OpenRouter/Claude CLI/Ollama/none)
  settings_gui.py # Settings editor (tkinter, GUI only)
  transcriber.py  # Whisper model loading + transcription
  tray.py         # System tray icon + menu (pystray, GUI only)
  updater.py      # Auto-updater (GitHub releases + pip-aware)
  window.py       # Active window/file detection
  wizard.py       # First-run setup wizard (tkinter, GUI only)

installer/
  invoke.spec     # PyInstaller build config
  invoke.iss      # Inno Setup installer script
  build.ps1       # Local build script
```

## License

MIT — see [THIRD-PARTY-LICENSES.txt](THIRD-PARTY-LICENSES.txt) for bundled dependency licenses.
