Metadata-Version: 2.4
Name: seq-explorer
Version: 0.1.0
Summary: Visualize hidden state evolution in sequence models
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: numpy>=2.4.2
Requires-Dist: pandas>=2.3.3
Requires-Dist: plotly>=6.5.2
Requires-Dist: polars>=1.38.1
Requires-Dist: streamlit>=1.54.0
Requires-Dist: torch>=2.10.0
Provides-Extra: dev
Requires-Dist: isort>=7.0.0; extra == "dev"
Requires-Dist: mkdocs-material>=9.7.1; extra == "dev"
Requires-Dist: mkdocs-table-reader-plugin>=3.1.0; extra == "dev"
Requires-Dist: mkdocstrings-python>=2.0.2; extra == "dev"
Requires-Dist: pre-commit>=4.5.1; extra == "dev"
Requires-Dist: pymdown-extensions>=10.20.1; extra == "dev"
Requires-Dist: ruff>=0.15.1; extra == "dev"
Dynamic: license-file

# Sequence Explorer

Interactive Streamlit dashboard for visualizing how a sequence model's hidden state evolves over transaction sequences. Works with **any PyTorch RNN** (GRU, LSTM, RNN) with **any number of layers**.

## Two Ways to Use

### As a Package (pip install)

```bash
pip install seq-explorer
```

**In Python/Notebooks:**
```python
from seq_explorer import SequenceTrace

trace = SequenceTrace.from_arrays(...)
```

**Run dashboard:**
```bash
streamlit run src/seq_explorer/app.py
```

### As a Project (clone & run)

```bash
git clone https://github.com/chris-santiago/seq-explorer
cd seq-explorer
uv sync
uv run python src/seq_explorer/build_cache.py dataframe your_data.csv -o cache.parquet
uv run streamlit run src/seq_explorer/app.py
```

## What it shows

- **Model score timeline** — running P(fraud) at every timestep, color-coded green → red
- **Hidden state heatmap** — per-neuron activations across the sequence (any number of layers)
- **Hidden state norms** — L2 norm over time for all layers, plus rate-of-change bars
- **Top-k neuron drill-down** — neurons most correlated with the fraud score, traced over time
- **Layer similarity** — cosine similarity between consecutive hidden state layers
- **Raw features table** — the actual transaction data, highlighted at the selected timestep
- **Metadata overlays** — visualize categorical/numeric metadata on timelines (e.g., risk tiers, channels)
- **Timestep scrubber** — linked across all panels for synchronized inspection

## Quick Start

```bash
# Install dependencies
uv sync

# Build cache from CSV/Parquet (auto-detects schema)
uv run python src/seq_explorer/build_cache.py dataframe your_data.csv -o cache.parquet

# Launch dashboard
uv run streamlit run src/seq_explorer/app.py
```

The dashboard auto-detects hidden state columns - just use any prefix pattern like `h0_*, h1_*` or `encoder_*, decoder_*`.

## Usage Options

### Option 1: Construct + Plot Directly in Jupyter (Simplest!)

No need to save files or run Streamlit. Just use the plotting functions:

```python
from seq_explorer import (
    SequenceTrace,
    fraud_score_timeline,
    hidden_state_heatmap,
    hidden_norm_plot,
    top_neuron_traces,
    layer_similarity_plot,
    raw_feature_heatmap,
    feature_fraud_correlation,
    metadata_timeline_overlay,
)

trace = SequenceTrace.from_arrays(
    sequence_id=0,
    label=1,
    raw_features=my_features,        # (seq_len, n_features)
    feature_names=['amount', ...],
    hidden_states=[h0, h1],         # list of (seq_len, hidden_dim)
    running_fraud_scores=scores,    # (seq_len,)
)

# All plotting functions return Plotly figures - show them inline!
fraud_score_timeline(trace.running_fraud_scores).show()
hidden_state_heatmap(trace.hidden_states[0]).show()
hidden_norm_plot(trace.hidden_norms).show()
top_neuron_traces(
    trace.hidden_states[0],
    trace.top_neuron_indices[0],
    trace.top_neuron_correlations[0]
).show()
```

### Option 2: Construct + Save + Dashboard

```python
# Save to Parquet
df = SequenceTrace.to_dataframe({0: trace})
df.write_parquet('cache.parquet')

# Launch dashboard
streamlit run src/seq_explorer/app.py
```

### Option 3: From DataFrame

```bash
python src/seq_explorer/build_cache.py dataframe data.csv -o cache.parquet
```

### Option 4: From Model

```bash
python src/seq_explorer/build_cache.py model \
    --checkpoint model.ckpt \
    --data transactions.pt \
    --auto-select \
    -o cache.parquet
```

## Model Support

Works with any PyTorch sequence model:

- **GRU** - any number of layers
- **LSTM** - any number of layers
- **RNN** - any number of layers
- Custom architectures with different attribute names (e.g., `encoder`, `rnn_module`)

## Project structure

```
seq-explorer/
├── seq_explorer/           # Package + CLI
│   ├── __init__.py
│   ├── app.py            # Streamlit dashboard
│   ├── plots.py          # Plotly figure builders
│   ├── extractor.py      # Model trace extraction
│   ├── trace.py          # Data models
│   └── build_cache.py   # Cache builder CLI
├── docs/                  # Documentation
├── demo/                  # Demo notebooks
└── README.md
```

## Documentation

See the `docs/` folder for full documentation:

- [Quick Start](docs/quickstart.md)
- [Dashboard Guide](docs/dashboard.md)
- [Cache Format](docs/cache-format.md)
- [Architecture](docs/architecture.md)
