Metadata-Version: 2.3
Name: taskclf
Version: 0.4.4
Summary: Local-first task classifier that infers your work type from computer activity signals
Author: fruitiecutiepie
Author-email: fruitiecutiepie <104437268+fruitiecutiepie@users.noreply.github.com>
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Office/Business
Classifier: Topic :: System :: Monitoring
Requires-Dist: duckdb>=1.4.4
Requires-Dist: fastapi>=0.115.0
Requires-Dist: python-multipart>=0.0.18
Requires-Dist: lightgbm>=4.6.0
Requires-Dist: matplotlib>=3.10.8
Requires-Dist: pandas>=3.0.0
Requires-Dist: pyarrow>=23.0.0
Requires-Dist: pydantic>=2.12.5
Requires-Dist: rich>=14.3.2
Requires-Dist: scipy>=1.15.0
Requires-Dist: scikit-learn>=1.8.0
Requires-Dist: pywebview>=5.0
Requires-Dist: typer>=0.23.1
Requires-Dist: uvicorn[standard]>=0.34.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: pystray>=0.19.5
Requires-Dist: pillow>=11.0.0
Requires-Dist: tomli-w>=1.2.0
Requires-Python: >=3.14
Project-URL: Homepage, https://github.com/fruitiecutiepie/taskclf
Project-URL: Documentation, https://fruitiecutiepie.github.io/taskclf
Project-URL: Repository, https://github.com/fruitiecutiepie/taskclf
Description-Content-Type: text/markdown

# taskclf — Task Type Classifier from Local Activity Signals

[![PyPI version](https://img.shields.io/pypi/v/taskclf)](https://pypi.org/project/taskclf/)
[![Python](https://img.shields.io/pypi/pyversions/taskclf)](https://pypi.org/project/taskclf/)

Train and run a personal task-type classifier (e.g. coding / writing / meetings) using privacy-preserving computer activity signals such as foreground app/window metadata and aggregated input statistics (counts/rates only).

This project is intentionally scoped as a **personalized classifier** (single-user first). The architecture keeps:
- **Collectors** (platform/tool dependent) isolated behind adapters
- **Features** as a versioned, validated contract
- **Models** as bundled artifacts with schema checks
- **Inference** as a small, stable loop that emits task segments and daily summaries

## Goals
- Fast iteration: first useful model in < 1 week of data
- Privacy: no raw keystrokes, no raw window titles persisted
- Stability: feature schema versioning + schema hash gates
- Extensibility: add new collectors and models without breaking consumers

## Non-Goals
- Universal (multi-user) generalization out of the box
- Storing or analyzing raw typed content
- "Perfect" labeling UI (start minimal, iterate later)

---

## Labels (v1)

Eight core labels defined in `schema/labels_v1.json`:

| ID | Label | Description |
|----|-------|-------------|
| 0 | `Build` | Writing or implementing structured content in editor/terminal |
| 1 | `Debug` | Investigating issues, terminal-heavy troubleshooting |
| 2 | `Review` | Reviewing technical material or diffs with light edits |
| 3 | `Write` | Writing structured non-code content |
| 4 | `ReadResearch` | Consuming information with minimal production |
| 5 | `Communicate` | Asynchronous coordination (chat/email) |
| 6 | `Meet` | Synchronous meetings or calls |
| 7 | `BreakIdle` | Idle or break period |

Labels are stored as **time spans** (not per-keystroke events). Users can remap
core labels to personal categories via a **taxonomy config**
(see `configs/user_taxonomy_example.yaml`).

---

## Data Flow Overview

### Structures (pipelines)
* ETL pipeline reads raw → produces features parquet
* Training pipeline reads features + labels → produces model
* Inference pipeline reads new events → emits predictions + segments

### Batch (repeatable)
1. **Ingest**: pull ActivityWatch export → `data/raw/aw/`
2. **Feature build**: events → per-minute features → `data/processed/features_v1/`
3. **Label import**: label spans → `data/processed/labels_v1/`
4. **Build dataset**: join features + labels, split by time → training arrays
5. **Train**: fit model → `models/<run_id>/`
6. **Evaluate**: metrics, acceptance checks, calibration
7. **Report**: daily summaries → `artifacts/`

### Online (real-time)
Every N seconds:
- read the last minute(s) of events
- compute the latest feature bucket
- predict + smooth (with optional calibration and taxonomy mapping)
- append predictions → `artifacts/`

At end-of-day:
- produce report

---

## Privacy & Safety
This repo enforces the following:
- **No raw keystrokes** are stored (only aggregate counts/rates).
- **No raw window titles** are stored by default.
  - Titles are hashed or locally tokenized; you can keep a local mapping if you choose.
- Dataset artifacts stay **local-first**.

---

## Quick Start

### Requirements
- Python >= 3.14
- For the recommended CLI install: [uv](https://docs.astral.sh/uv/)

### Install

**Command-line (PyPI)** — install the `taskclf` CLI:

```bash
uv tool install taskclf
```

Or with pip only:

```bash
pip install taskclf
```

Then run `taskclf --help`.

**Desktop app (optional)** — a small Electron shell is built for Windows, Linux,
and macOS. Download the latest **launcher** installers from
[GitHub Releases](https://github.com/fruitiecutiepie/taskclf/releases). Choose
the file for your OS:

| OS | File |
|----|------|
| Windows | `*.exe` (NSIS installer) |
| Linux | `*.AppImage` |
| macOS | `*.dmg` (open and drag **taskclf** to Applications) |

Those assets are published on GitHub releases whose tag starts with
`launcher-v`.

**Development** (from a git checkout):

```bash
uv sync
uv run taskclf --help
```

### Ingest (ActivityWatch)

```bash
uv run taskclf ingest aw --input /path/to/activitywatch-export.json
```

This parses an ActivityWatch JSON export, normalizes app names to reverse-domain
identifiers, hashes window titles (never storing raw text), and writes
privacy-safe events to `data/raw/aw/<YYYY-MM-DD>/events.parquet` partitioned by
date.

Options:
- `--out-dir` — output directory (default: `data/raw/aw`)
- `--title-salt` — salt for hashing window titles (default: `taskclf-default-salt`)

### Build features

```bash
uv run taskclf features build --date 2026-02-16
```

### Import labels

```bash
uv run taskclf labels import --file labels.csv
```

Or add individual label blocks:

```bash
uv run taskclf labels add-block \
  --start 2026-02-16T09:00:00 --end 2026-02-16T10:00:00 --label Build
```

Or label what you're doing right now (no timestamps needed):

```bash
uv run taskclf labels label-now --minutes 10 --label Build
```

This queries ActivityWatch for a live summary of apps used in the last N minutes
and creates the label span automatically.

### Export labels to CSV

```bash
uv run taskclf labels export --out my_labels.csv
```

### Train

```bash
uv run taskclf train lgbm --from 2026-02-01 --to 2026-02-16
```

### Run batch inference

```bash
uv run taskclf infer batch --model-dir models/<run_id> --from 2026-02-01 --to 2026-02-16
```

### Run online inference

```bash
uv run taskclf infer online --model-dir models/<run_id>
```

Starts a polling loop that queries a running ActivityWatch server, builds
feature rows from live window events, predicts task types using a trained model,
smooths predictions, and writes running outputs to `artifacts/`. Press Ctrl+C
to stop; a final daily report is generated on shutdown.

Options:
- `--poll-seconds` — seconds between polls (default: 60)
- `--aw-host` — ActivityWatch server URL (default: `http://localhost:5600`)
- `--smooth-window` — rolling majority window size (default: 3)
- `--title-salt` — salt for hashing window titles (default: `taskclf-default-salt`)
- `--out-dir` — output directory (default: `artifacts`)
- `--label-queue` / `--no-label-queue` — auto-enqueue low-confidence predictions for manual labeling
- `--label-confidence` — confidence threshold for auto-enqueue (default: 0.55)

### Run baseline (no model needed)

```bash
uv run taskclf infer baseline --from 2026-02-01 --to 2026-02-16
```

Rule-based classifier useful for day-1 bootstrapping before you have a trained model.

### Produce report

```bash
uv run taskclf report daily --segments-file artifacts/segments.json
```

---

## CLI Reference

All commands: `uv run taskclf --help`

| Group | Commands | Purpose |
|-------|----------|---------|
| `ingest` | `aw` | Import ActivityWatch exports |
| `features` | `build` | Build per-minute feature rows |
| `labels` | `import`, `add-block`, `label-now`, `show-queue`, `project` | Manage label spans and labeling queue |
| `train` | `build-dataset`, `lgbm`, `evaluate`, `tune-reject`, `calibrate`, `retrain`, `check-retrain` | Training, evaluation, and retraining pipeline |
| `taxonomy` | `validate`, `show`, `init` | User-defined label groupings |
| `infer` | `batch`, `online`, `baseline`, `compare` | Prediction (ML, rule-based, comparison) |
| `report` | `daily` | Daily summaries (JSON/CSV/Parquet) |
| `monitor` | `drift-check`, `telemetry`, `show` | Feature drift and telemetry tracking |
| *(top-level)* | `tray` | System tray labeling app with activity transition detection |
| *(top-level)* | `ui` | Web UI for labeling, queue, and live prediction streaming |

Full CLI docs: `docs/api/cli/main.md`

---

## Repo Layout

* `src/taskclf/` — application code (adapters, core, features, labels, train, infer, report, ui)
* `schema/` — versioned JSON schemas for features and labels
* `configs/` — configuration files (model params, retrain policy, taxonomy examples)
* `docs/` — API reference and guides (served via `make docs-serve`)
* `data/` — raw and processed datasets (local, gitignored)
* `models/` — trained model bundles (one folder per run)
* `artifacts/` — predictions, segments, reports, evaluation outputs
* `tests/` — test suite

---

## Model Artifact Contract

Every saved model bundle (`models/<run_id>/`) contains:

* the model file
* `metadata.json`: feature schema version + hash, label set, training date range, params, dataset hash
* `metrics.json`: macro/weighted F1, per-class metrics
* `confusion_matrix.csv`
* categorical encoders (if applicable)

Inference refuses to run if the schema hash mismatches the model bundle.

---

## Development

Common tasks are in the `Makefile`:

```bash
make lint        # ruff check .
make test        # pytest
make typecheck   # mypy src
make docs-serve  # local preview at http://127.0.0.1:8000
make docs-build  # static site in site/
```

**Electron backend payload** (for packaged app downloads): after `make ui-build`,
build the PyInstaller one-folder sidecar zip used by GitHub releases (`v*` tags):

```bash
uv sync --group bundle
make build-payload   # writes build/payload-<triple>.zip
```

See `docs/api/scripts/payload_build.md` for details.

---

## License

TBD (local-first personal project by default).
