AbstractIndex

AbstractIndex()

Protocol for index operations — implemented by Index and AtmosphereIndex.

Manages dataset metadata: publishing/retrieving schemas, inserting/listing datasets. A single index holds datasets of many sample types, tracked via schema references.

Examples

>>> def publish_and_list(index: AbstractIndex) -> None:
...     index.publish_schema(ImageSample, version="1.0.0")
...     index.insert_dataset(image_ds, name="images")
...     for entry in index.list_datasets():
...         print(f"{entry.name} -> {entry.schema_ref}")

Attributes

Name Description
data_store Optional data store for reading/writing shards.

Methods

Name Description
decode_schema Reconstruct a Packable type from a stored schema.
get_dataset Get a dataset entry by name or reference.
get_schema Get a schema record by reference.
insert_dataset Register an existing dataset in the index.
publish_schema Publish a schema for a sample type.
write Write samples and create an index entry in one step.

decode_schema

AbstractIndex.decode_schema(ref)

Reconstruct a Packable type from a stored schema.

Raises

Name Type Description
KeyError If schema not found.
ValueError If schema has unsupported field types.

Examples

>>> SampleType = index.decode_schema(entry.schema_ref)
>>> ds = Dataset[SampleType](entry.data_urls[0])

get_dataset

AbstractIndex.get_dataset(ref)

Get a dataset entry by name or reference.

Raises

Name Type Description
KeyError If dataset not found.

get_schema

AbstractIndex.get_schema(ref)

Get a schema record by reference.

Raises

Name Type Description
KeyError If schema not found.

insert_dataset

AbstractIndex.insert_dataset(ds, *, name, schema_ref=None, **kwargs)

Register an existing dataset in the index.

Parameters

Name Type Description Default
ds Dataset The Dataset to register. required
name str Human-readable name. required
schema_ref Optional[str] Explicit schema ref; auto-published if None. None
**kwargs Backend-specific options. {}

publish_schema

AbstractIndex.publish_schema(sample_type, *, version='1.0.0', **kwargs)

Publish a schema for a sample type.

Parameters

Name Type Description Default
sample_type type A Packable type (@packable-decorated or subclass). required
version str Semantic version string. '1.0.0'
**kwargs Backend-specific options. {}

Returns

Name Type Description
str Schema reference string (local://... or at://...).

write

AbstractIndex.write(samples, *, name, schema_ref=None, **kwargs)

Write samples and create an index entry in one step.

Serializes samples to WebDataset tar files, stores them via the appropriate backend, and creates an index entry.

Parameters

Name Type Description Default
samples Iterable Iterable of Packable samples. Must be non-empty. required
name str Dataset name, optionally prefixed with target backend. required
schema_ref Optional[str] Optional schema reference. None
**kwargs Backend-specific options (maxcount, description, etc.). {}

Returns

Name Type Description
IndexEntry IndexEntry for the created dataset.