Metadata-Version: 2.4
Name: valarray
Version: 0.4
Summary: Library for validating numpy arrays.
Project-URL: Homepage, https://codeberg.org/jfranek/valarray
Author-email: "J. Franek" <franek.j27@email.cz>
License-Expression: MIT
Keywords: numpy, validation, array, typing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.7
Requires-Dist: numpy>=2
Description-Content-Type: text/markdown

![](assets/images/valarray_logo.svg)

In short, library for validating numpy arrays that also helps with static analysis and documentation. In long, see [Library rationale](#library-rationale).

Currently intended primarily as a personal/hobby project (see [Caveats](#caveats))

I have gotten away with using it in a professional setting, but YMMV. 

# Quick start <!-- omit from toc -->
Install ***valarray*** via pip:
```shell
pip install valarray
```

Define a validate array class:
```python
import numpy as np
from valarray.numpy import ValidatedNumpyArray
from valarray.core.errors_exceptions import ValidationException

class ExampleValidatedNumpyArray(ValidatedNumpyArray[np.float32]):
    dtype = "float32"
    schema = ('n', 3)
    
    ge=0
```

Validate a numpy array: 
```python
try:
    v_arr = ExampleValidatedNumpyArray(np.array([[1,-2,3], [4,5,-6]]))
except ValidationException as v_exc:
    print(v_exc)


>>> 'ExampleValidatedNumpyArray' validation failed:
>>> Incorrect axis sizes: (2, *4*), expected (any, 3).
>>> Invalid Array Values (>= 0):
>>>         [-2.0, -6.0]
```

# Table of contents
- [Table of contents](#table-of-contents)
- [Library rationale](#library-rationale)
  - [1) Invalid values causing unintended behaviour](#1-invalid-values-causing-unintended-behaviour)
    - [Problem](#problem)
    - [Solution](#solution)
  - [2) Limited support for static analysis](#2-limited-support-for-static-analysis)
    - [Problem](#problem-1)
    - [Solution](#solution-1)
  - [3) Need for explicit documentation](#3-need-for-explicit-documentation)
    - [Problem](#problem-2)
    - [Solution](#solution-2)
- [Validated Array](#validated-array)
  - [Defining a validated array](#defining-a-validated-array)
  - [Creating a validated array instance](#creating-a-validated-array-instance)
  - [Accessing array values](#accessing-array-values)
- [Validation functions](#validation-functions)
- [Array schema](#array-schema)
  - [Field](#field)
  - [Array schema examples](#array-schema-examples)
    - [rectangles](#rectangles)
- [Validators](#validators)
  - [Defining a validator](#defining-a-validator)
    - [ValidationResult](#validationresult)
    - [Example Validator](#example-validator)
- [Catching exceptions](#catching-exceptions)
  - [Special exceptions and errors](#special-exceptions-and-errors)
  - [Generic Errors](#generic-errors)
- [Caveats](#caveats)

# Library rationale
This library aims to help with 3 issues encountered when working with numpy arrays:

## 1) Invalid values causing unintended behaviour
### Problem
Invalid values can cause crashes, or worse, cause silent failures.

For example the following code fails silently when attempting to cut patches from image using bounding boxes with invalid coordinates.
```python
import numpy as np
import numpy.typing as npt


def cut_patches(
    img: npt.NDArray[np.uint8], boxes: npt.NDArray[np.int64]
) -> list[npt.NDArray[np.uint8]]:
    patches = []
    for box in boxes:
        patch = img[box[0] : box[2], box[1] : box[3], :]
        patches.append(patch)

    return patches

img_random = np.random.random((400, 400, 3)).astype(np.uint8) * 255

boxes_xyxy_invalid = np.array(
    [[-10, 100, 200, 200], [150, 50, 200, 250]], dtype=int
)

patches = cut_patches(img_random, boxes_xyxy_invalid)

for patch in patches:
    print(patch.shape)

>>> (0, 100, 3) # empty image patch
>>> (50, 200, 3)
```

### Solution
Validate boxes array first. If errors are encountered, print descriptive error message(s).

```python
import numpy as np
import numpy.typing as npt

from valarray.core.errors_exceptions import ValidationException
from valarray.numpy import Field
from valarray.numpy.array import ValidatedNumpyArray

class BoxesXYXY(ValidatedNumpyArray[np.int64]):
    dtype = int
    schema = (
        "n",
        (
            Field(ge=0),
            Field(ge=0),
            Field(ge=0),
            Field(ge=0),
        ),
    )

def cut_patches(
    img: npt.NDArray[np.uint8], boxes: npt.NDArray[np.int64]
) -> list[npt.NDArray[np.uint8]]:
    patches = []
    for box in boxes:
        patch = img[box[0] : box[2], box[1] : box[3], :]
        patches.append(patch)

    return patches

try:
    img_random = np.random.random((400, 400, 3)).astype(np.uint8) * 255

    boxes_xyxy_invalid = BoxesXYXY.validate(
        np.array([[-10, 100, 200, 200], [150, 50, 200, 250]], dtype=int)
    )

    patches = cut_patches(img_random, boxes_xyxy_invalid.array)

    for patch in patches:
        print(patch.shape)

except ValidationException as exc:
    for err in exc.errs:
        print(err.msg)

>>> Invalid Field Values (< 0):
>>>         Axis < 1 >: '_sized_4'
>>>                 Field < 0 >: [-10]
```

## 2) Limited support for static analysis
### Problem
Support for static analysis is limited. Tools can only check whether the datatype is correct, but not shape, values or what those values actually represent.

For example, the function to crop patches needs the boxes to be defined by `xmin, ymin, xmax, ymax` but doesn't throw an error if input boxes are defined by `x_center, y_center, width, height` and static analysis tools cannot detect this error using bulit-in numpy types.
```python
import numpy as np
import numpy.typing as npt

def cut_patches(
    img: npt.NDArray[np.uint8], boxes: npt.NDArray[np.int64]
) -> list[npt.NDArray[np.uint8]]:
    patches = []
    for box in boxes:
        patch = img[box[0] : box[2], box[1] : box[3], :]
        patches.append(patch)

    return patches

img_random = np.random.random((400, 400, 3)).astype(np.uint8) * 255

boxes_xyxy = np.array([[0, 100, 200, 200], [150, 50, 200, 250]], dtype=int)

boxes_xywh = np.array([[100, 150, 200, 100], [175, 50, 50, 250]], dtype=int)

patches = cut_patches(img_random, boxes_xyxy)  # type checker does not complain

print("Valid")
for patch in patches:
    print(patch.shape)

patches_inv = cut_patches(img_random, boxes_xywh)  # type checker still does not complain

print("Invalid")
for patch in patches_inv:
    print(patch.shape)

>>> Valid
>>> (200, 100, 3)
>>> (50, 200, 3)
>>> Invalid
>>> (100, 0, 3)
>>> (0, 200, 3)
```

### Solution
`ValidatedNumpyArray` subclasses can represent these two types of boxes arrays, and can be used instead of bare numpy arrays in function/method signatures and such.
```python
    import numpy as np
    import numpy.typing as npt

    from valarray.numpy import Field
    from valarray.numpy.array import ValidatedNumpyArray

    class BoxesXYXY(ValidatedNumpyArray[np.int64]):
        dtype = int
        schema = (
            "n",
            (
                Field(ge=0),
                Field(ge=0),
                Field(ge=0),
                Field(ge=0),
            ),
        )

    class BoxesXYWH(ValidatedNumpyArray[np.int64]):
        dtype = int
        schema = (
            "n",
            (
                Field(ge=0),
                Field(ge=0),
                Field(gt=0),
                Field(gt=0),
            ),
        )

    def cut_patches(
        img: npt.NDArray[np.uint8], boxes: BoxesXYXY
    ) -> list[npt.NDArray[np.uint8]]:
        patches = []
        for box in boxes.array:
            patch = img[box[0] : box[2], box[1] : box[3], :]
            patches.append(patch)

        return patches

    img_random = np.random.random((400, 400, 3)).astype(np.uint8) * 255

    boxes_xyxy = BoxesXYXY.wrap(
        np.array([[0, 100, 200, 200], [150, 50, 200, 250]], dtype=int)
    )
    boxes_xywh = BoxesXYWH.wrap(
        np.array([[100, 150, 200, 100], [175, 50, 50, 250]], dtype=int)
    )

    patches = cut_patches(img_random, boxes_xyxy)  # type checker does not complain

    print("Valid")
    for patch in patches:
        print(patch.shape)

    patches_inv = cut_patches(
        img_random, boxes_xywh  # type checker reports wrong argument type
    )

    print("Invalid")
    for patch in patches_inv:
        print(patch.shape)
```

## 3) Need for explicit documentation
### Problem
Using built-in numpy types provides only documentation for data types. Shape, values, constraints and what the array represents need to be explicitly documented either in comments or docstrings.

If this type of array is used in multiple places / functions, this can cause duplicated documentation.

```python
import numpy as np
import numpy.typing as npt

def cut_patches(
    img: npt.NDArray[np.uint8], boxes: npt.NDArray[np.int64]
) -> list[npt.NDArray[np.uint8]]:
    """Cuts patches from an image.

    Args:
        img (npt.NDArray[np.uint8]): Source image
        boxes (npt.NDArray[np.int64]): Array of N boxes `xmin, ymin, xmax, ymax` in pixels.

    Returns:
        list[npt.NDArray[np.uint8]]: List of patches.
    """
    patches = []
    for box in boxes:
        patch = img[box[0] : box[2], box[1] : box[3], :]
        patches.append(patch)

    return patches
```

### Solution
Defining data type, schema and constraints on a `ValidatedNumpyArray` subclass already implicitly documents them.

This can be complemented by adding additional (or summary) documentation in the class docstring.

This implicit/explicit documentation can be then accessed from multiple functions via parameter type.
```python
import numpy as np
import numpy.typing as npt

from valarray.numpy import Field
from valarray.numpy.array import ValidatedNumpyArray

class BoxesXYXY(ValidatedNumpyArray[np.int64]):
    """Array of N `xyxy` boxes in pixels."""

    dtype = int
    schema = (
        "n",
        (
            Field("xmin_px", ge=0),
            Field("ymin_px", ge=0),
            Field("xmax_px", ge=0),
            Field("ymax_px", ge=0),
        ),
    )

def cut_patches(
    img: npt.NDArray[np.uint8], boxes: BoxesXYXY
) -> list[npt.NDArray[np.uint8]]:
    """Cuts patches from an image.

    Args:
        img (npt.NDArray[np.uint8]): Source image
        boxes (BoxesXYXY): Boxes to cut patches with.

    Returns:
        list[npt.NDArray[np.uint8]]: List of patches.
    """
    patches = []
    for box in boxes.array:
        patch = img[box[0] : box[2], box[1] : box[3], :]
        patches.append(patch)

    return patches
```


# Validated Array
## Defining a validated array
Subclass `ValidatedNumpyArray` and define:
- `dtype` - expected data type specification (such as `float`, `"float64"`, `np.float64`).
    If not specified, data type is not validated.
    For full list of accepted values, see:
    https://numpy.org/doc/stable/reference/arrays.dtypes.html#specifying-and-constructing-data-types
- `schema` - expected shape specification (of type `valarray.numpy.axes_and_fields.AxesTuple`). For details, see [Array Schema](#array-schema).
        If not specified, shape is not validated (and no field validators are applied). 
- `lt`/`le`/`ge`/`gt`/`eq` - basic array value constraints -> less (or equal) than, greater (or equal) than, equal to
- `validators` - optional list of validators applied to the whole array. For details, see [Validators](##validators).

```python
import numpy as np
from valarray.numpy import ValidatedNumpyArray

class ExampleValidatedNumpyArray(ValidatedNumpyArray):
  dtype = np.float32
  schema = ('batch_size', 3, 5)
```

## Creating a validated array instance
There are 4 ways to create a validated array instance:
- validate an existing array
```python
# using .validate()
v_arr = ExampleValidatedNumpyArray.validate(np.array([1,2],dtype=np.float32))
# using .__init__()
v_arr = ExampleValidatedNumpyArray(np.array([1,2],dtype=np.float32), validate=True)
``` 
- from an existing array without validation (to be used as a type hint)
```python
# using .wrap()
v_arr = ExampleValidatedNumpyArray.wrap(np.array([1,2],dtype=np.float32))
# using .__init__()
v_arr = ExampleValidatedNumpyArray(np.array([1,2],dtype=np.float32), validate=False)

# NOTE: validation can be performed at a later stage using:
v_arr.validate_array()
```
- from an arbitrary object pased to `np.array` constructor 
  (data type of the resulting array is taken from the validated array class definition, 
  if no data type is defined, the most appropriate type is chosen by using `np.asarray()`)
```python
v_arr = ExampleValidatedNumpyArray([1,2])
```
- create an empty array 
  - ***shape*** inferred from schema or empty if not defined
  - ***dtype*** from validated array class definition or default if not defined
```python
# using .empty()
v_arr = ExampleValidatedNumpyArray.empty()
# or __init__
v_arr = ExampleValidatedNumpyArray(None)
```

If created from an existing array, there is an option to try to coerce array to the expected data type.
`CoerceDTypeException` is raised if this fails.
```python
# (only) using .__init__() 
v_arr = ExampleValidatedNumpyArray(np.array([1,2],dtype=np.float32), coerce_dtype=True, validate=True)
v_arr = ExampleValidatedNumpyArray(np.array([1,2],dtype=np.float32), coerce_dtype=True, validate=False)
``` 

## Accessing array values
You can access the underlying array using the `.array`/`.a_` property:
```python
arr = v_arr.array
# or
arr = v_arr.a_
```
It is recommended to make a copy before performing operations that could invalidate the array:
```python
arr = v_arr.array.copy()
```

It is also recommended to specify array data type when subclassing `ValidatedNumpyArray` to ensure correct type hint:
```python
class UnspecifiedDataTypeArray(ValidatedNumpyArray): ...

arr = UnspecifiedDataTypeArray(...).array # np.typing.NDArray[Unknown]

class SpecifiedDataTypeArray(ValidatedNumpyArray[np.float32]): ...
    
arr = SpecifiedDataTypeArray(...).array # np.typing.NDArray[np.float32]
```

# Validation functions
Array validation is designed to be modular and composable and validation functions can be used on they own if only runtime validation is required.
Each validation function returns a list of errors, from which a `ValidationException` can be raised. For details see [Catching exceptions](#catching-exceptions).
```python
from valarray.numpy.validation_functions import validate_*

# validate array and get a list of errors (empty if no errors)
errs = validate_*(arr, ...)

# validate array and raise exception if errors are returned
validate_*(arr, ...).raise_()
```

There are 4 validation functions:
- **validate_dtype**
  - Checks that array has the expected datatype.
  - *returns* `NumpyIncorrectDTypeError`
- **validate_shape**
  - Checks that array has the right number of axes, and that the axes have expected sizes.
  - *returns* `IncorrectAxNumberError` and or `IncorrectAxSizesError`
- **validate_array_values**
  - Performs an arbitrary check on the values of the whole array using a [Validator](#validators).
  - *returns* `NumpyInvalidArrayValuesError`
- **validate_field_values**
  - Performs an arbitrary check on the values of selected fields using a [Validator](#validators) defined in [Array Schema](#array-schema).
  - By default expects array to be in the correct shape. If this is not guaranteed, set parameter `check_shape=True`.
  - *returns* `NumpyInvalidFieldValuesError` (and possibly `IncorrectAxNumberError`/`IncorrectAxSizesError` if `check_shape=True`)

and a "composite" validation function:
- **validate_array**
  - performs validation in the following order:
    - `validate_dtype()`
    - `validate_shape()`
    - *returns* `NumpyIncorrectDTypeError`/`IncorrectAxNumberError`/`IncorrectAxSizesError` if any.
    - `validate_array_values()`
    - `validate_field_values()`
    - *returns* `NumpyInvalidArrayValuesError`/`NumpyInvalidFieldValuesError` or no errors.


# Array schema
Schema defines expected axes, and for each axis its' fields and optionally constraints on the field values.

Axes can be defined with:
  - integer size (`6`)
  - name string ('axis_name`)
  - tuple of fields

Fields can be defined with:
  - name string ('field_name')
  - instance of `valarray.numpy.Field`

``` python
from valarray.numpy import Field

schema = (
  "axis_0",
  3,
  ("field_a", Field())
)
```
## Field
Defines (optional) name and value constrints for array field. More specifically:
- `name` - descriptive name used in error messages (if missing, field index is used instead)
- `lt`/`le`/`ge`/`gt`/`eq` - basic array value constraints -> less (or equal) than, greater (or equal) than, equal to
- `validators` - other validators of fields values. For details, see [Validators](##validators).

```python
from typing import Any

from valarray.numpy import Field, NumpyValidator

class ExampleNumpyValidator(NumpyValidator[Any]):
    def validate(self, arr):
        return True

f1 = Field("example_named_field", ge=0)
f2 = Field(gt=10, validators=(ExampleNumpyValidator(),))
```

## Array schema examples
### rectangles
An array of arbitrary number of rectangles defined by min and max coordinates which has two axes: *n_rects* and *rect*. 
Axis *rect* is has 4 fields: *x_min*,*y_min*,*x_max*,*y_max*, where values must be greater or equal to zero.

``` python
import numpy as np

from valarray.numpy import ValidatedNumpyArray
from valarray.numpy.axes_and_fields import Field

# validated array with schema
class Rect(ValidatedNumpyArray):
    schema = (
        "n_rects",
        (
            Field("x_min", ge=0),
            Field("y_min", ge=0),
            Field("x_max", ge=0),
            Field("y_max", ge=0),
        ),
    )

# example array
arr = np.array(
    [
        [10, 20, 30, 40],
        [15, 25, 35, 45],
    ],
)

Rect.validate(arr)
```

# Validators
Validators are objects that perform arbitrary validation of array or field values defined by user.

## Defining a validator
Validators must subclass `valarray.numpy.NumpyValidator` Abstract Base Class 
and implement the `.validate()` method that takes an array as an input and results in success/failure of validation using these options:

- **on success**:
  - *returns* `valarray.core.ValidationResult(status="OK")`
  - *returns* `True`
  - *returns* `None`
- **on failure**:
    - *returns* `valarray.core.ValidationResult(status="FAIL")`
    - *returns* `False`
    - *raises* `ValueError`

### ValidationResult
Contains result status of validation `status="OK"`/`status="FAIL"`

Can also optionally contain:
- message to be added to validation error 
- indices of invalid values

Indices use [advanced numpy indexing](https://numpy.org/doc/stable/user/basics.indexing.html#advanced-indexing) and can be either:
- a boolean array
- a tuple of integer arrays with length equal to the number of array axes

**!** If used for validating field values, it is recommended that validators return ValidationResult with indices. 
Error messages can then properly show which values of which fields caused the validation to fail.

```python
import numpy as np
from valarray.core import ValidationResult

# 2D array of shape (3,3)
indices = np.array(
    [
        [False, False, False],
        [True, False, False],
        [False, False, True],
    ]
)

indices = (np.array([0, 1, 1]), np.array([1, 0, 1]))

res = ValidationResult(status="FAIL", indices_invalid=indices, msg="Optional error message.")
```

### Example Validator
```python
from dataclasses import dataclass
from typing import Literal

import numpy as np

from valarray.core import ValidationResult
from valarray.numpy import NumpyValidator

@dataclass
class ExampleIsEvenValidator(NumpyValidator[np.uint8]):
    method: Literal["boolean", "raise", "result"] = "boolean"

    def validate(self, arr):
        even = arr % 2 == 0

        all_even = np.all(even)

        match self.method:
            case "boolean":
                if all_even:
                    return True

                return False
            case "raise":
                if all_even:
                    return None

                raise ValueError()
            case "result":
                if all_even:
                    return ValidationResult("OK")

                return ValidationResult("FAIL", indices_invalid=~even)
```

# Catching exceptions
Failed validation results in `valarray.core.errors_exceptions.ValidationException` being raised containing list of errors responsible (and name of array class if available).

Main error types are:
- `IncorrectDTypeError` - Wrong data type. **\***
- `IncorrectAxNumberError` - Wrong number of axes 
- `IncorrectAxSizesError` - Ax or axes have wrong size(s)
- `InvalidArrayValuesError` - Validator applied to the whole array failed. **\***
- `InvalidFieldValuesError` - Validator applied field(s) failed. **\***

**\*** These errors have special variants that ensure proper type hints. See [Generic Errors](#generic-errors).

Error list is a special list type that in addition to integer index and slice can be filtered by error type(s).
```python
try:
    ...
except ValidationException as exc:
    err = exc.errs[0]

    sliced_errs = exc.errs[:2]
    
    dtype_errs = exc.errs[NumpyIncorrectDTypeError]

    axis_errs = exc.errs[(IncorrectAxSizesError, IncorrectAxNumberError)]
```

## Special exceptions and errors
There are two special subclasses of `ValidationException` with associated validation errors raised during [instantiation](#creating-a-validated-array-instance):
- `CreateArrayException` -> `CannotCreateArrayError` - Array cannot be created from supplied object.
- `CoerceDTypeException` -> `CannotCoerceDTypeError` - If array data type cannot be coerced when creating array with `ValidatedArray(coerce_dtype=True)`


## Generic Errors
These error types have subclasses ensuring proper type hints:
- `IncorrectDTypeError` -> `NumpyIncorrectDTypeError`
- `CannotCoerceDTypeError` -> `NumpyCannotCoerceDTypeError`
- `InvalidArrayValuesError` -> `NumpyInvalidArrayValuesError`
- `InvalidFieldValuesError` -> `NumpyInvalidFieldValuesError`


# Caveats
- I cannot guarantee that the test suite is foolproof ATM as I'm currently the only one testing this library.
- Library has so far only been tested with `python==3.12` and `numpy==2.4.0`
- Library isn't tested for performance, use in production only if the primary bottleneck is brain and not hardware.