import numpy as np
from numpy.typing import NDArray
import atdata
from atdata.atmosphere import AtmospherePromotion Workflow
This tutorial demonstrates the workflow for migrating datasets from local Index-managed storage to the federated ATProto atmosphere network. Promotion is the bridge between Layer 2 (managed storage) and Layer 3 (federation).
Why Promotion?
A common pattern in data science:
- Start private: Develop and validate datasets within your team
- Go public: Share successful datasets with the broader community
Promotion handles this transition without re-processing your data. Instead of creating a new dataset from scratch, you’re lifting an existing local dataset entry into the federated atmosphere.
The workflow handles several complexities automatically:
- Schema deduplication: If you’ve already published the same schema type and version, promotion reuses it
- URL preservation: Data stays in place (unless you explicitly want to copy it)
- CID consistency: Content identifiers remain valid across the transition
Overview
The promotion workflow moves datasets from local storage to the atmosphere:
LOCAL ATMOSPHERE
----- ----------
Index (SQLite/Redis) ATProto PDS
LocalDiskStore / S3 --> (same storage or new location)
atdata://local/schema/... at://did:plc:.../schema/...
Key features:
- Schema deduplication: Won’t republish identical schemas
- Flexible data handling: Keep existing URLs or copy to new storage
- Metadata preservation: Local metadata carries over to atmosphere
Setup
Prepare a Local Dataset
First, set up a dataset in local storage using index.write():
# 1. Define sample type
@atdata.packable
class ExperimentSample:
"""A sample from a scientific experiment."""
measurement: NDArray
timestamp: float
sensor_id: str
# 2. Create samples
samples = [
ExperimentSample(
measurement=np.random.randn(64).astype(np.float32),
timestamp=float(i),
sensor_id=f"sensor_{i % 4}",
)
for i in range(1000)
]
# 3. Write through index (handles sharding, schema, and storage)
index = atdata.Index(data_store=atdata.LocalDiskStore())
local_entry = index.write(samples, name="experiment-2024-001", maxcount=500)
print(f"Local entry name: {local_entry.name}")
print(f"Local entry CID: {local_entry.cid}")
print(f"Data URLs: {local_entry.data_urls}")Basic Promotion
Promote the dataset to ATProto using index.promote_entry():
# Connect to atmosphere and attach to the index
client = Atmosphere.login("myhandle.bsky.social", "app-password")
index = atdata.Index(atmosphere=client, data_store=atdata.LocalDiskStore())
# Promote by entry name
at_uri = index.promote_entry("experiment-2024-001")
print(f"Published: {at_uri}")Promotion with Metadata
Add description, tags, and license:
at_uri = index.promote_entry(
"experiment-2024-001",
name="experiment-2024-001-v2", # Override name
description="Sensor measurements from Lab 302",
tags=["experiment", "physics", "2024"],
license="CC-BY-4.0",
)
print(f"Published with metadata: {at_uri}")Schema Deduplication
The promotion workflow automatically checks for existing schemas on the atmosphere. When you promote multiple datasets with the same sample type, the schema is only published once:
# First promotion: publishes schema to atmosphere
uri1 = index.promote_entry("experiment-batch-1")
# Second promotion with same schema type + version: reuses existing schema
uri2 = index.promote_entry("experiment-batch-2")Data Migration Options
By default, promotion keeps the original data URLs:
# Data stays in original storage location
at_uri = index.promote_entry("experiment-2024-001")Benefits:
- Fastest option, no data copying
- Dataset record points to existing URLs
- Requires original storage to remain accessible
To copy data to a different storage location, use promote_dataset() with a Dataset loaded from the entry’s URLs:
# Load the dataset and promote directly
entry = index.get_entry_by_name("experiment-2024-001")
ds = atdata.Dataset[ExperimentSample](entry.data_urls[0])
at_uri = index.promote_dataset(
ds,
name="experiment-2024-001",
description="Sensor measurements from Lab 302",
)Benefits:
- Data is copied to new bucket
- Good for moving from private to public storage
- Original storage can be retired
Verify on Atmosphere
After promotion, verify the dataset is accessible:
entry = index.get_dataset(at_uri)
print(f"Name: {entry.name}")
print(f"Schema: {entry.schema_ref}")
print(f"URLs: {entry.data_urls}")
# Load and iterate — schema auto-resolved
ds = atdata.load_dataset(at_uri, split="train")
for batch in ds.ordered(batch_size=32):
print(f"Measurement shape: {batch.measurement.shape}")
breakError Handling
try:
at_uri = index.promote_entry("experiment-2024-001")
except KeyError as e:
# Entry or schema not found in index
print(f"Not found: {e}")
except ValueError as e:
# Entry has no data URLs or atmosphere not available
print(f"Invalid state: {e}")Requirements Checklist
Before promotion:
Complete Workflow
# Complete local-to-atmosphere workflow
import numpy as np
from numpy.typing import NDArray
import atdata
from atdata.atmosphere import Atmosphere
# 1. Define sample type
@atdata.packable
class FeatureSample:
features: NDArray
label: int
# 2. Create samples
samples = [
FeatureSample(
features=np.random.randn(128).astype(np.float32),
label=i % 10,
)
for i in range(1000)
]
# 3. Write through index (schema persisted automatically)
index = atdata.Index(data_store=atdata.LocalDiskStore())
entry = index.write(samples, name="feature-vectors-v1", maxcount=500)
# 4. Promote to atmosphere
client = Atmosphere.login("myhandle.bsky.social", "app-password")
index = atdata.Index(atmosphere=client, data_store=atdata.LocalDiskStore())
at_uri = index.promote_entry(
"feature-vectors-v1",
description="Feature vectors for classification",
tags=["features", "embeddings"],
license="MIT",
)
print(f"Dataset published: {at_uri}")
# 5. Others can now discover and load
# ds = atdata.load_dataset("@myhandle.bsky.social/feature-vectors-v1", split="train")What You’ve Learned
You now understand the promotion workflow:
| Concept | Purpose |
|---|---|
index.promote_entry() |
Lift local entries to federated network by name |
index.promote_dataset() |
Promote a Dataset object directly |
| Schema deduplication | Avoid publishing duplicate schemas |
| Data URL preservation | Keep data in place or copy to new storage |
| Metadata enrichment | Add description, tags, license during promotion |
Promotion completes atdata’s three-layer story: you can now move seamlessly from local experimentation to team collaboration to public sharing, all with the same typed sample definitions.
The Complete Journey
┌──────────────────┐ index.write ┌──────────────────┐ promote ┌──────────────────┐
│ Local Files │ ────────────→ │ Managed Storage │ ───────────→ │ Federation │
│ │ │ │ │ │
│ write_samples() │ │ Index (SQLite) │ │ Index + │
│ Dataset[T] │ │ LocalDiskStore │ │ Atmosphere │
└──────────────────┘ └──────────────────┘ └──────────────────┘
Next Steps
- Atmosphere Reference - Complete atmosphere API
- Protocols - Abstract interfaces
- Local Storage - Local storage reference