DictSample
DictSample(_data=None, **kwargs)Dynamic sample type providing dict-like access to raw msgpack data.
This class is the default sample type for datasets when no explicit type is specified. It stores the raw unpacked msgpack data and provides both attribute-style (sample.field) and dict-style (sample["field"]) access to fields.
DictSample is useful for: - Exploring datasets without defining a schema first - Working with datasets that have variable schemas - Prototyping before committing to a typed schema
To convert to a typed schema, use Dataset.as_type() with a @packable-decorated class. Every @packable class automatically registers a lens from DictSample, making this conversion seamless.
Examples
>>> ds = load_dataset("path/to/data.tar") # Returns Dataset[DictSample]
>>> for sample in ds.ordered():
... print(sample.some_field) # Attribute access
... print(sample["other_field"]) # Dict access
... print(sample.keys()) # Inspect available fields
...
>>> # Convert to typed schema
>>> typed_ds = ds.as_type(MyTypedSample)Note
NDArray fields are stored as raw bytes in DictSample. They are only converted to numpy arrays when accessed through a typed sample class.
Attributes
| Name | Description |
|---|---|
| as_wds | Serialize for writing to WebDataset (__key__ + msgpack). |
| packed | Serialize to msgpack bytes. |
Methods
| Name | Description |
|---|---|
| from_bytes | Create a DictSample from raw msgpack bytes. |
| from_data | Create a DictSample from unpacked msgpack data. |
| get | Get a field value, returning default if missing. |
| keys | Return list of field names. |
| to_dict | Return a copy of the underlying data dictionary. |
from_bytes
DictSample.from_bytes(bs)Create a DictSample from raw msgpack bytes.
from_data
DictSample.from_data(data)Create a DictSample from unpacked msgpack data.
get
DictSample.get(key, default=None)Get a field value, returning default if missing.
keys
DictSample.keys()Return list of field names.
to_dict
DictSample.to_dict()Return a copy of the underlying data dictionary.