DictSample

DictSample(_data=None, **kwargs)

Dynamic sample type providing dict-like access to raw msgpack data.

This class is the default sample type for datasets when no explicit type is specified. It stores the raw unpacked msgpack data and provides both attribute-style (sample.field) and dict-style (sample["field"]) access to fields.

DictSample is useful for: - Exploring datasets without defining a schema first - Working with datasets that have variable schemas - Prototyping before committing to a typed schema

To convert to a typed schema, use Dataset.as_type() with a @packable-decorated class. Every @packable class automatically registers a lens from DictSample, making this conversion seamless.

Examples

>>> ds = load_dataset("path/to/data.tar")  # Returns Dataset[DictSample]
>>> for sample in ds.ordered():
...     print(sample.some_field)      # Attribute access
...     print(sample["other_field"])  # Dict access
...     print(sample.keys())          # Inspect available fields
...
>>> # Convert to typed schema
>>> typed_ds = ds.as_type(MyTypedSample)

Note

NDArray fields are stored as raw bytes in DictSample. They are only converted to numpy arrays when accessed through a typed sample class.

Attributes

Name Description
as_wds Serialize for writing to WebDataset (__key__ + msgpack).
packed Serialize to msgpack bytes.

Methods

Name Description
from_bytes Create a DictSample from raw msgpack bytes.
from_data Create a DictSample from unpacked msgpack data.
get Get a field value, returning default if missing.
keys Return list of field names.
to_dict Return a copy of the underlying data dictionary.

from_bytes

DictSample.from_bytes(bs)

Create a DictSample from raw msgpack bytes.

from_data

DictSample.from_data(data)

Create a DictSample from unpacked msgpack data.

get

DictSample.get(key, default=None)

Get a field value, returning default if missing.

keys

DictSample.keys()

Return list of field names.

to_dict

DictSample.to_dict()

Return a copy of the underlying data dictionary.