Metadata-Version: 2.4
Name: mini-vision
Version: 0.5.0
Summary: YOLO segmentation utilities for video streams
Author: Deivid Manfre
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: ultralytics
Requires-Dist: opencv-python
Requires-Dist: numpy
Requires-Dist: websockets

# mini_vision

Simple Python library for object segmentation in video streams using YOLO models.

The goal of this library is to provide a modular pipeline to:

* consume video streams
* perform object segmentation
* render contours or masks

The library follows data-oriented design and low coupling principles, allowing easy replacement of computer vision models.

---

## Installation

~~~bash
pip install mini-vision
~~~

## Usage

Example showing how to use `mini_vision` to consume a video stream, run YOLO segmentation, and render object contours.

### Import the library

~~~python
from mini_vision import (
    YoloSegmenter,
    SegmentationRenderer,
    WSFrameClient
)
~~~

### Components

| Component              | Description                                                         |
| ---------------------- | ------------------------------------------------------------------- |
| `YoloSegmenter`        | Runs object segmentation using one or more YOLO segmentation models |
| `SegmentationRenderer` | Draws segmentation contours or masks on the frame                  |
| `WSFrameClient`        | Connects to a WebSocket video stream and yields frames             |

---

## Device (CPU / GPU)

`YoloSegmenter` supports explicit device selection for inference.

By default, it runs on CPU, but you can manually choose the execution device.

| Device   | Description                           |
| -------- | ------------------------------------- |
| `"cpu"`  | Runs inference on CPU (default)       |
| `"cuda"` | Runs inference on NVIDIA GPU (faster) |

### Example using CPU

~~~python
segmenter = YoloSegmenter("yolov8n-seg.pt", device="cpu")
~~~

### Example using GPU (CUDA)

~~~python
segmenter = YoloSegmenter("yolov8n-seg.pt", device="cuda")
~~~

> Note: CUDA requires a compatible NVIDIA GPU and properly installed drivers.

---

## Multiple Models

`YoloSegmenter` supports loading more than one YOLO model at the same time.

You can pass:

- a single `.pt` model file
- or a directory containing multiple `.pt` model files

When a directory is provided, the library loads all supported model files found inside it and combines detections from all loaded models into a single output.

### Example using a single model

~~~python
segmenter = YoloSegmenter("models/yolov8n-seg.pt", device="cpu")
detections = segmenter.segment(frame)
~~~

### Example using a directory with multiple models

~~~python
segmenter = YoloSegmenter("models/", device="cuda")
detections = segmenter.segment(frame)
~~~

Example directory structure:

~~~text
models/
├── car-seg.pt
├── person-seg.pt
└── animal-seg.pt
~~~

This is useful when you want to combine specialized models in the same segmentation pipeline.

---

## YoloSegmenter

Runs object segmentation using one or more YOLO segmentation models.

~~~python
segmenter = YoloSegmenter("yolov8n-seg.pt", device="cpu")
detections = segmenter.segment(frame)
~~~

You can also load multiple models by passing a directory:

~~~python
segmenter = YoloSegmenter("models/", device="cuda")
detections = segmenter.segment(frame)
~~~

### Tracking during segmentation

To populate object tracking IDs, enable tracking when calling `segment(...)`.

~~~python
detections = segmenter.segment(frame, track=True)
~~~

This allows each detection to carry a `track_id`, which can be used by the renderer, JSON output, or TOON output.

---

## SegmentationRenderer

Responsible for rendering segmentation contours or masks on frames.

~~~python
renderer = SegmentationRenderer()
frame = renderer.draw(
    frame,
    detections,
    mode="contour",
    detect_object=["car", "person", "dog"],
    detect_color=["gold", "neon_pink", "lime"],
    thickness=2
)
~~~

### Tracking by ID

The renderer also supports object tracking visualization by ID.

To enable tracking visualization, pass `track=True` in `renderer.draw(...)`.

~~~python
renderer = SegmentationRenderer()
frame = renderer.draw(
    frame,
    detections,
    mode="contour",
    detect_object=["car", "person", "dog"],
    detect_color=["gold", "neon_pink", "lime"],
    thickness=2,
    track=True
)
~~~

When tracking is enabled, the rendered label can include the tracked object ID together with the class label and confidence score.

Example output on frame:

~~~text
car #3 0.87
person #1 0.92
dog #5 0.81
~~~

> Note: to have `track_id` populated in rendered labels, JSON output, or TOON output, tracking must be enabled during segmentation:
>
> ~~~python
> detections = segmenter.segment(frame, track=True)
> ~~~

---

## JSON Output (optional)

`SegmentationRenderer` can optionally return structured detection data in JSON format.

This allows integration with logging systems, APIs, analytics pipelines, or other downstream processing tools.

To enable this feature, pass `return_json=True`.

~~~python
frame, data = renderer.draw(
    frame,
    detections,
    mode="contour",
    detect_object=["car", "person", "dog"],
    detect_color=["gold", "neon_pink", "lime"],
    thickness=2,
    return_json=True
)
~~~

Example JSON output:

~~~json
{
  "detections": [
    {
      "label": "person",
      "score": 0.92,
      "track_id": 3,
      "bbox": [120, 80, 240, 300]
    },
    {
      "label": "car",
      "score": 0.88,
      "track_id": 7,
      "bbox": [400, 210, 560, 350]
    }
  ]
}
~~~

If `return_json` is not enabled, the renderer behaves normally and only returns the processed frame.

---

## TOON Output

The renderer can also return detections in TOON format, a lightweight text representation designed for agent pipelines, logging, and token-efficient LLM processing.

Enable it using `return_toon=True`.

~~~python
frame, toon = renderer.draw(
    frame,
    detections,
    return_toon=True
)
~~~

Example TOON output:

~~~text
frame_width=1280 frame_height=720
label=person track_id=3 x=412 y=210 w=120 h=260 cx=472 cy=340 area=31200 score=0.91
label=car track_id=7 x=102 y=320 w=180 h=90 cx=192 cy=365 area=16200 score=0.88
~~~

If tracking is enabled, TOON output can include the tracked object ID through the `track_id` field.

---

## WSFrameClient

Connects to a WebSocket video stream and yields frames.

~~~python
client = WSFrameClient("ws://127.0.0.1:8000/ws/frames")

async for frame in client.frames():
    detections = segmenter.segment(frame, track=True)

    frame = renderer.draw(
        frame,
        detections,
        mode="contour",
        detect_object=["car", "person", "dog"],
        detect_color=["gold", "neon_pink", "lime"],
        thickness=2,
        track=True
    )
~~~

---

## Full Example

~~~python
from mini_vision import (
    YoloSegmenter,
    SegmentationRenderer,
    WSFrameClient
)

segmenter = YoloSegmenter("models/", device="cuda")
renderer = SegmentationRenderer()
client = WSFrameClient("ws://127.0.0.1:8000/ws/frames")

async for frame in client.frames():
    detections = segmenter.segment(frame, track=True)

    frame = renderer.draw(
        frame,
        detections,
        mode="contour",
        detect_object=["car", "person", "dog"],
        detect_color=["gold", "neon_pink", "lime"],
        thickness=2,
        track=True
    )
~~~
