Unified index for tracking datasets across multiple repositories.
Implements the AbstractIndex protocol. Maintains a registry of dataset entries across named repositories (always including a built-in "local" repository) and an optional atmosphere (ATProto) backend.
The "local" repository is always present and uses the storage backend determined by the provider argument. When no provider is given, defaults to SQLite (zero external dependencies). Pass a redis connection or Redis **kwargs for backwards-compatible Redis behaviour.
Additional named repositories can be mounted via the repos parameter, each pairing an IndexProvider with an optional data store.
An Atmosphere is available by default for anonymous read-only resolution of @handle/dataset paths. Pass an authenticated client for write operations, or atmosphere=None to disable.
Number of stub files removed, or 0 if auto_stubs is disabled.
decode_schema
local.Index.decode_schema(ref)
Reconstruct a Python PackableSample type from a stored schema.
This method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a PackableSample subclass matching the schema definition.
If auto_stubs is enabled, a Python module will be generated and the class will be imported from it, providing full IDE autocomplete support. The returned class has proper type information that IDEs can understand.
Decode a schema with explicit type hint for IDE support.
This is a typed wrapper around decode_schema() that preserves the type information for IDE autocomplete. Use this when you have a stub file for the schema and want full IDE support.
The decoded type, cast to match the type_hint for IDE support.
Examples
>>># After enabling auto_stubs and configuring IDE extraPaths:>>>from local.MySample_1_0_0 import MySample>>>>>># This gives full IDE autocomplete:>>> DecodedType = index.decode_schema_as(ref, MySample)>>> sample = DecodedType(text="hello", value=42) # IDE knows signature!
Note
The type_hint is only used for static type checking - at runtime, the actual decoded type from the schema is returned. Ensure the stub matches the schema to avoid runtime surprises.
get_dataset
local.Index.get_dataset(ref)
Get a dataset entry by name or prefixed reference.
Supports repository-prefixed lookups (e.g. "lab/mnist"), atmosphere paths ("@handle/dataset"), AT URIs, and bare names (which default to the "local" repository).
>>> index = Index(auto_stubs=True)>>> ref = index.publish_schema(MySample, version="1.0.0")>>> index.load_schema(ref)>>>print(index.get_import_path(ref))local.MySample_1_0_0>>># Then in your code:>>># from local.MySample_1_0_0 import MySample
get_schema
local.Index.get_schema(ref)
Get a schema record by reference (AbstractIndex protocol).
Insert a dataset into the index (AbstractIndex protocol).
The target repository is determined by a prefix in the name argument (e.g. "lab/mnist"). If no prefix is given, or the prefix is "local", the built-in local repository is used.
If the target repository has a data_store, shards are written to storage first, then indexed. Otherwise, the dataset’s existing URL is indexed directly.
Optional repository filter. If None, aggregates entries from "local" and all named repositories. Use "local" for only the built-in repository, a named repo key, or "_atmosphere" for atmosphere entries.
Load a schema and make it available in the types namespace.
This method decodes the schema, optionally generates a Python module for IDE support (if auto_stubs is enabled), and registers the type in the :attr:types namespace for easy access.
>>># Load and use immediately>>> MyType = index.load_schema("atdata://local/schema/MySample@1.0.0")>>> sample = MyType(field1="hello", field2=42)>>>>>># Or access later via namespace>>> index.load_schema("atdata://local/schema/OtherType@1.0.0")>>> other = index.types.OtherType(data="test")
Promote a locally-indexed dataset to the atmosphere.
Looks up the entry by name in the local index, resolves its schema, and publishes both schema and dataset record to ATProto via the index’s atmosphere backend.
Semantic version string (e.g., ‘1.0.0’). If None, auto-increments from the latest published version (patch bump), or starts at ‘1.0.0’ if no previous version exists.
Write samples and create an index entry in one step.
This is the primary method for publishing data. It serializes samples to WebDataset tar files, stores them via the appropriate backend, and creates an index entry.
The target backend is determined by the name prefix:
Bare name (e.g., "mnist"): writes to the local repository.
"@handle/name": writes and publishes to the atmosphere.
"repo/name": writes to a named repository.
When the local backend has no data_store configured, a LocalDiskStore is created automatically at ~/.atdata/data/ so that samples have persistent storage.
.. note::
This method is synchronous. Samples are written to a temporary
location first, then copied to permanent storage by the backend.
Avoid passing lazily-evaluated iterators that depend on external
state that may change during the call.