Metadata-Version: 2.4
Name: vcti-array-view
Version: 1.2.0
Summary: Pipeline-style view over NumPy arrays — shape columns, labels, and rows, then render as text, DataFrame, or HTML
Author: Visual Collaboration Technologies Inc.
Requires-Python: <3.15,>=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=2.0
Requires-Dist: vcti-nputils>=1.0.0
Provides-Extra: pandas
Requires-Dist: pandas>=2.2; extra == "pandas"
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Requires-Dist: pandas>=2.2; extra == "test"
Requires-Dist: hypothesis>=6.100; extra == "test"
Provides-Extra: lint
Requires-Dist: ruff; extra == "lint"
Provides-Extra: typecheck
Requires-Dist: mypy; extra == "typecheck"
Requires-Dist: pandas-stubs; extra == "typecheck"
Provides-Extra: docs
Requires-Dist: sphinx>=8; extra == "docs"
Requires-Dist: myst-parser>=4; extra == "docs"
Requires-Dist: furo; extra == "docs"
Dynamic: license-file

# Array View

[![CI](https://github.com/vctmohan/vcti-python-array-view/actions/workflows/ci.yml/badge.svg)](https://github.com/vctmohan/vcti-python-array-view/actions/workflows/ci.yml)
[![PyPI version](https://img.shields.io/pypi/v/vcti-array-view.svg)](https://pypi.org/project/vcti-array-view/)
[![Python versions](https://img.shields.io/pypi/pyversions/vcti-array-view.svg)](https://pypi.org/project/vcti-array-view/)
[![License](https://img.shields.io/badge/license-Proprietary-lightgrey.svg)](LICENSE)
[![Checked with mypy](https://img.shields.io/badge/mypy-strict-2a6db2.svg)](https://mypy.readthedocs.io/)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

A configurable view over a NumPy array — inspect, export, and display
without writing boilerplate.

## Overview

`ArrayView` wraps a NumPy array (structured, 1-D, or 2-D) without
copying the underlying data and lets you describe how it should look
— which columns, which rows, what labels — then render it as text, a
pandas DataFrame, or HTML.  The original array remains directly accessible as
`view.array`, so numpy-level work isn't locked out.

It's intended for large arrays (CAE-scale, millions of rows) where
printing everything isn't useful.  Inspection usually needs something
transformed:

- **Row selection** — head/tail windows or boolean filters
- **Column selection** — drop internal padding or bookkeeping fields
- **Column transformation** — flatten `position (3,)` → `position_x / y / z`
- **Value transformation** — resolve integer codes to human names
  (`element_type=1 → "QUAD"`)

All of these apply lazily to the rendered rows only — a 10M-row array
with a 25-row window does 25 rows of work.

```python
import numpy as np
from vcti.arrayview import ArrayView

arr = np.array(
    [(1, 10.5), (2, 20.1), (3, 30.7)],
    dtype=[("id", "i4"), ("value", "f8")],
)

view = ArrayView(arr).set_index("id")

# Render as text.
print(view.to_table())

# The view is a live configuration, not a throwaway.  Reconfigure and
# render again — each render reads the current state.
view.set_column_labels({"value": "measurement"}).slice(head=2)
df = view.to_dataframe()
```

Configuration methods (`set_column_labels`, `set_index`, `slice`, etc.)
return `self`, so if you prefer the one-expression style the same
steps read top-to-bottom:

```python
view = (ArrayView(arr)
        .set_column_labels({"value": "measurement"})
        .set_index("id")
        .slice(head=10))
text = view.to_table()
```

## Installation

```bash
pip install vcti-array-view>=1.2.0
```

With pandas support (optional — enables `to_dataframe()` and `to_html()`):

```bash
pip install vcti-array-view[pandas]>=1.2.0
```

---

## Quick Start

```python
import numpy as np
from vcti.arrayview import ArrayView, FILLER_COLUMNS, LENGTH_COLUMNS

dt = np.dtype([
    ("node_id",      "i4"),
    ("f0",           "V4"),             # C++ alignment padding
    ("element_type", "i4"),             # enum ID
    ("label",        "U20"),
    ("label_len",    "i4"),             # string-length bookkeeping
    ("position",     "f8", (3,)),       # vector field
])
arr = np.zeros(1_000_000, dtype=dt)     # imagine this is full of real data

view = (ArrayView(arr)
        .exclude_patterns([FILLER_COLUMNS, LENGTH_COLUMNS])
        .set_component_names("position", ["x", "y", "z"])
        .add_enum_columns({
            "element_type_name": ("element_type", {1: "QUAD", 2: "HEX"}),
        })
        .set_index("node_id")
        .slice(head=10))

print(view.to_table())      # text table, first 10 rows
df = view.to_dataframe()    # pandas DataFrame, node_id as pd.Index
html = view.to_html()       # HTML string for dashboards
```

---

## Input shapes

`ArrayView(array, *, dtype=None, names=None)` accepts three input
shapes.  Structured arrays are used as-is; plain 1-D and 2-D arrays are
reinterpreted as structured via `np.ndarray.view()` (zero-copy for
C-contiguous input).

```python
# Structured array — fields already named
dt = np.dtype([("id", "i4"), ("value", "f8")])
arr = np.array([(1, 10.0), (2, 20.0)], dtype=dt)
view = ArrayView(arr)         # view_columns == ["id", "value"]

# Plain 1-D — names= assigns the field name
view = ArrayView(np.array([1.5, 2.0, 3.5]), names=["measurement"])
# view_columns == ["measurement"]

# Plain 2-D — names= avoids the col_0, col_1, ... defaults
view = ArrayView(np.random.rand(1000, 3), names=["x", "y", "z"])
# view_columns == ["x", "y", "z"]

# dtype= reinterprets bytes — accepts string, list-of-tuples, np.dtype,
# or a callable (np.ndarray) -> np.dtype
view = ArrayView(raw_arr, dtype=[("node_id", "i4"), ("value", "f8")])
```

`dtype` and `names` are mutually exclusive.  Non-contiguous 2-D arrays
raise `ValueError` — call `np.ascontiguousarray()` first if you accept
the copy cost.  `ndim > 2` is rejected; reshape first.

---

## Operations

Operations fall into four kinds — three axes of display shaping
(columns, labels, rows) and the output methods.  All return `self` and
generally compose in any order — you can call them any number of times
and reconfigure freely between renders.

### Column shaping — *which columns appear and in what order*

All operations refer to columns by their **dtype name**, not their
display label.

| Method | Effect |
|---|---|
| `exclude_patterns(patterns)` | Drop columns matching any pattern (exact name, regex, or `(name, dtype) -> bool`). Pre-built: `FILLER_COLUMNS`, `LENGTH_COLUMNS`, `VOID_COLUMNS`. |
| `set_view_columns(columns=None)` | Replace the visible list with a whitelist (or reset to all dtype columns when called without an argument). For pattern-based filtering, compose with `exclude_patterns`. |
| `include_view_columns(columns)` | Append columns to the visible list. |
| `drop_view_columns(columns)` | Remove columns from the visible list. |

### Label / value shaping — *how columns and cells are displayed*

Labels affect rendering only.  The dtype identity of a column never
changes, so other operations keep working after you relabel.

| Method | Effect |
|---|---|
| `set_column_labels(mapping)` | Set display labels for top-level dtype fields.  For a scalar field, the label is the column header; for a multi-component field, it's the level-1 group header. |
| `set_component_names(field, names)` | Set level-2 labels for the flattened components of a multi-component field (e.g. `["x", "y", "z"]`). Defaults to `"0", "1", ...` when unset. |
| `add_column_group(name, columns)` | Attach a level-1 header label spanning *columns*. The chosen `name` is also the displayed label. Usable as an index source. |
| `remove_column_group(name)` | Remove a group.  Clears the index if the group was the current source. |
| `add_enum_columns({name: (id_col, {int: str})})` | Attach a virtual string column that maps integer codes in *id_col* to human names (e.g., `{1: "QUAD", 2: "HEX"}`).  The new column takes *id_col*'s place in the view.  Mapping is applied only to rendered rows — a 10M-row array with a 20-row window does 20 lookups. |

### Row shaping — *which rows appear and how they are identified*

| Method | Effect |
|---|---|
| `set_index(source)` | Designate row identity.  Use a **`str`** — a dtype field name for a single-column index, or a column-group name for a multi-column index using the group's members.  Use an **`np.ndarray`** of matching length for an external identifier (useful when several arrays share the same row identity without duplicating the id into each); a structured ndarray's fields each become a separate index level.  Becomes the pandas `Index` / `MultiIndex`. |
| `clear_index()` | Remove the row identifier. |
| `slice(head=..., tail=..., mask=..., indices=...)` | Configure a bounded window.  Output methods render only within this window. |
| `clear_slice()` | Remove the slice configuration — full range. |
| `compute_slice()` | Return the integer index array for the current slice (ndarray of `np.intp`). |

### Output methods — *materialize the configured view*

| Method | Returns |
|---|---|
| `to_table(...)` | Text table as `str` (pure numpy, no dependencies). |
| `to_dataframe(...)` | pandas DataFrame. |
| `to_html()` | HTML string (pandas under the hood). |

Output methods return the rendered result; they do not consume state.
Reconfigure and call again as often as you like.

---

## Immutability

Once a view is fully configured, `freeze()` locks its configuration.
Slicing and rendering continue to work; any call that would modify
shaping, labels, groups, index, enums, or joins raises
`FrozenViewError` (subclass of `RuntimeError`).

| Method / Property | Effect |
|---|---|
| `freeze()` | Lock the view's configuration.  Permanent for this instance.  Returns `self`. |
| `is_frozen` | `True` if this view has been frozen. |
| `copy()` | Branch a modifiable view — the copy is always unfrozen. |

```python
from vcti.arrayview import ArrayView, FrozenViewError

view = ArrayView(arr).set_index("id").freeze()

view.slice(head=50)                 # OK — slicing allowed
print(view.to_table())              # OK — rendering allowed
try:
    view.drop_view_columns(["x"])   # raises FrozenViewError
except FrozenViewError as exc:
    ...

branch = view.copy()                # branch is unfrozen
branch.drop_view_columns(["x"])     # OK on the branch
```

There is no `unfreeze()` by design — use `copy()` to produce a
modifiable branch.

---

## Why laziness matters

`ArrayView` is built to keep large arrays tractable — cost scales with
what you render, not with what you hold:

- **Bounded slicing** — `.slice(head=20, tail=5)` renders 25 rows
  regardless of array size.  `compute_slice()` returns a tiny integer
  array, not a copy of the data.
- **Lazy enum resolution** — `add_enum_columns({...})` stores a recipe.
  The integer → string mapping is applied only to rows actually
  rendered.  A 10M-row array with a 10-row slice performs 10 lookups.
- **Reference semantics** — the underlying NumPy array is never copied.
  Multiple `ArrayView` instances can share the same array.
- **Zero-copy dtype reinterpretation** — vector/matrix field flattening
  uses `np.ndarray.view()`, an O(1) dtype change, no memory movement.
- **pandas categorical columns** — enum names materialize as
  `pd.Categorical`, ~100× smaller than string columns.

---

## Branching views with `copy()`

`copy()` creates an independent `ArrayView` over the same underlying
array.  Configuration is deep-copied so branches don't affect each
other.  Use this when passing a view to code that might reconfigure
it, or when you need two renders with different slices side-by-side.

```python
export_view = view.copy().drop_view_columns(["label"])
view.array is export_view.array   # True — underlying array is shared
```

---

## Row index

`set_index(source)` designates what each row represents.  Two ways to
specify the source:

**1. A column (or group) already in the array** — pass a string.  For
a single-column index, pass a dtype field name:

```python
view = ArrayView(arr).set_index("node_id")
```

For a multi-column index, register a group first and pass its name:

```python
view = (ArrayView(arr)
        .add_column_group("face_id", ["element_idx", "face_num"])
        .set_index("face_id"))
```

**2. An external identifier array** — pass an `np.ndarray` of matching
length.  Useful when several arrays share the same row identity and
you don't want to duplicate the id into each:

```python
face_id = np.array(
    [(1, 0), (1, 1), (2, 0)],
    dtype=[("element_idx", "i4"), ("face_num", "i4")],
)
stress = ArrayView(stress_arr).set_index(face_id)
strain = ArrayView(strain_arr).set_index(face_id)
```

A structured ndarray produces a pandas `MultiIndex`; a plain 1-D array
becomes a single `"id"` level.  See [docs/patterns.md](docs/patterns.md)
for more recipes.

---

## Dependencies

- [numpy](https://numpy.org/) (>=2.0) — required
- [vcti-nputils](https://pypi.org/project/vcti-nputils/) (>=1.0.0) — required
- [pandas](https://pandas.pydata.org/) (>=2.2) — optional, for
  `to_dataframe()`, `to_html()`, and Jupyter display

---

## Further reading

- [docs/design.md](docs/design.md) — rationale behind the pipeline
- [docs/api.md](docs/api.md) — full API reference with signatures
- [docs/patterns.md](docs/patterns.md) — common usage recipes
- [docs/performance.md](docs/performance.md) — complexity analysis and
  large-array behavior
- [docs/troubleshooting.md](docs/troubleshooting.md) — errors, fixes,
  and gotchas
- [docs/extending.md](docs/extending.md) — extending the library
- [examples/full_pipeline.py](examples/full_pipeline.py) — end-to-end
  builder-style walkthrough
