Metadata-Version: 2.3
Name: cr_proc
Version: 0.2.5
Summary: A tool for processing BYU CS code recording files.
Author: Ethan Dye
Author-email: Ethan Dye <mrtops03@gmail.com>
Requires-Python: >=3.11
Description-Content-Type: text/markdown

# `code_recorder_processor`

`code_recorder_processor` processes `*.recording.jsonl.gz` files produced by
the current `jetbrains-recorder` and `vscode-recorder` implementations. It
reconstructs the edited document, compares that reconstruction to a template,
and reports suspicious activity such as large external pastes, rapid AI-style
paste bursts, and time-limit violations.

## Scope

The processor is designed around the current recorder implementations, not
around the historical examples in this repository.

Current schema expectations:

- Modern edit events use `type: "edit"`.
- Status events use typed records such as `type: "focusStatus"`.
- Events include `timestamp`, `document`, `offset`, `oldFragment`, and
  `newFragment`.

Compatibility behavior:

- Older recordings that omit `type` on edit events are still accepted.
- If a mixed recording contains both modern typed edits and later stale legacy
  untyped edits, the processor prefers the typed stream.
- Example recordings in `recordings/` are fixtures, not the schema source of
  truth.

## Installation

For development inside this repository:

```bash
uv sync --dev
```

For running commands in the repo without a global install, prefer:

```bash
uv run cr_proc --help
```

To install the CLI globally from a local checkout:

```bash
uv tool install .
```

After that, the `cr_proc` command is available directly:

```bash
cr_proc --help
```

If you want the global command to track local source changes while developing:

```bash
uv tool install --editable .
```

## Quick Start

The simplest invocation is to pass only recordings. When `--template` is
omitted, the processor looks for a matching template file next to each
recording.

Single recording:

```bash
uv run cr_proc path/to/student.recording.jsonl.gz
```

Multiple recordings:

```bash
uv run cr_proc recordings/*.recording.jsonl.gz
```

Explicit template file:

```bash
uv run cr_proc student.recording.jsonl.gz --template template.py
```

Template directory:

```bash
uv run cr_proc recordings/*.recording.jsonl.gz --template templates/
```

Write reconstructed output:

```bash
uv run cr_proc student.recording.jsonl.gz --write reconstructed.py
uv run cr_proc recordings/*.recording.jsonl.gz --write output/
```

Compare to submitted files:

```bash
uv run cr_proc student.recording.jsonl.gz --submitted submitted.py
uv run cr_proc recordings/*.recording.jsonl.gz --submitted submissions/
```

Write JSON results:

```bash
uv run cr_proc recordings/*.recording.jsonl.gz --output-json results.json
```

Playback mode:

```bash
uv run cr_proc student.recording.jsonl.gz --playback
```

This opens a windowed viewer. Use the left/right arrow keys to step through
edits, `Space` to play or pause, and `Home`/`End` to jump to the beginning or
final state. The viewer is generated as a local HTML page and opened in your
default browser.

Select a specific document from a multi-document recording:

```bash
uv run cr_proc multi-file.recording.jsonl.gz --document src/main.py
```

## CLI Reference

Core inputs:

- `inputs`: One or more recording files or glob patterns.
- `--template PATH`: Optional template file or template directory.
- `--document NAME`: Optional override for which document inside the recording
  should be processed. This matches the recorded document path or filename and
  is not another local file input.

Outputs:

- `--write PATH`: Write reconstructed code. In single-file mode this can be a
  file or a directory. In batch mode it must be a directory.
- `--output-json PATH`: Write structured JSON results.
- `--submitted PATH`: Compare reconstructed code to a submitted file or a
  directory of submitted files.

Verification and filtering:

- `--time-limit MINUTES`: Flag recordings whose active editing time exceeds the
  limit.
- `--filter-file FILE`: Exclude recordings matching a path, filename, or base
  filename.
- `--filter-function-generation`: Suppress suspicious autocomplete findings
  that are recognized as IDE-generated boilerplate function stubs.

Playback:

- `--playback`: Open a browser-based windowed playback viewer.
- `--playback-speed FLOAT`: Playback speed multiplier.
- `--playback-start-event N`: Start playback from a later applied-event index.

Compatibility aliases:

- Legacy positional-template usage still works.
- `--template-dir`, `--output-file`, `--output-dir`, `--submitted-file`, and
  `--submitted-dir` are still accepted as compatibility aliases.

## Template Resolution

When the processor needs a template, it resolves it in this order:

1. `--template <file>` uses that exact file.
2. `--template <directory>` searches that directory for the best filename or
   stem match to the recorded document.
3. If `--template` is omitted, the processor searches the recording's parent
   directory.
4. Legacy positional-template mode treats the last positional argument as a
   template file when it does not look like a recording path.

`--document` affects this process by telling the processor which recorded
document to treat as the target before template matching happens. It only
selects data already present in the recording.

If no matching template is found, processing still continues by falling back to
the recording snapshot as the reconstruction seed.

## Output Behavior

Normal user-facing output goes to `stderr`:

- time summaries
- suspicious-event summaries
- template mismatch diffs
- submitted-file comparison summaries
- warnings

Reconstructed code is written only when `--write` is used.

JSON output is written only when `--output-json` is used.

## Suspicious Activity Detection

The processor currently reports:

- large multi-line external pastes
- rapid clusters of pasted lines within one second as an AI indicator
- time-limit violations for single recordings and combined batch activity

These checks are heuristic. They are intended to surface recordings for review,
not to act as a standalone disciplinary decision engine.

## Development

Run tests:

```bash
uv run pytest -q
```

Run the bundled example recording:

```bash
uv run cr_proc recordings/cs111-homework0/cs111-homework0-ISC.recording.jsonl.gz
```

## CI and Release

GitHub Actions uses `uv`, not Poetry.

- CI installs dependencies with `uv sync --locked --dev`.
- CI currently runs on Python `3.11` and `3.14`.
- The publish workflow builds distributions with `uv build`.

## Repository Fixtures

The bundled recordings are documented in
[`recordings/README.md`](/Volumes/Developer/cs111/code_recorder_processor/recordings/README.md).
Those files are useful for regression tests and examples, but some were created
with older recorder versions and intentionally exercise compatibility paths.
