Metadata-Version: 2.4
Name: distributed-sqlite
Version: 0.3.0
Summary: Distributed SQLite-compatible storage engine backed by S3
Requires-Python: >=3.11
Requires-Dist: boto3>=1.34
Requires-Dist: msgpack>=1.0
Requires-Dist: pydantic>=2.0
Requires-Dist: sqlalchemy>=2.0
Provides-Extra: dev
Requires-Dist: alembic>=1.13; extra == 'dev'
Requires-Dist: moto[s3]>=5.0; extra == 'dev'
Requires-Dist: pytest-timeout>=2.3; extra == 'dev'
Requires-Dist: pytest-xdist>=3.5; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Description-Content-Type: text/markdown

# distributed-sqlite

A distributed SQLite-compatible storage engine backed solely by AWS S3.

## Overview

`distributed-sqlite` provides a standard SQLAlchemy/DBAPI2 interface over an
append-only, segment-based storage model on S3. It supports:

- **Snapshot isolation** — each transaction reads from a consistent snapshot
- **Optimistic concurrency** — CAS-based manifest commits with automatic retry
- **Conflict detection** — write-set intersection check; raises `ConflictError` on true conflicts
- **Exponential backoff with jitter** — full jitter retry up to 10 attempts
- **WAL-like semantics** — immutable segments + versioned manifests, never mutates committed data
- **Crash recovery** — orphaned segments (written but not committed) are detected and safely ignored
- **Alembic migrations** — Alembic sees a standard SQLite interface; all DDL and migration ops work unchanged
- **Local caching** — LRU disk cache for segments, in-memory snapshot cache

## Storage Layout

```
{bucket}/{prefix}/
  manifests/v{N:020d}.json   # Immutable manifest per version
  segments/{uuid}.seg        # Immutable append-only segments (msgpack)
  root.json                  # Eventually-consistent version hint
```

## Connection URL

```
distributed_sqlite+distributed_sqlite:///<bucket>/<prefix>
```

## Quick Start

```python
from distributed_sqlite.engine import bootstrap, open_connection, create_engine

# Initialize the store (idempotent)
bootstrap("my-bucket", "mydb")

# Raw DBAPI2 connection
with open_connection("my-bucket", "mydb") as conn:
    cur = conn.cursor()
    cur.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT)")
    cur.execute("INSERT INTO users VALUES (1, 'Alice')")
    conn.commit()

# SQLAlchemy engine
import sqlalchemy as sa
engine = create_engine("distributed_sqlite+distributed_sqlite:///my-bucket/mydb")
```

### boto3 sessions (STS, LocalStack, custom credential chains)

Pass a [`boto3.Session`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html) so the library never has to rely on mutating process environment (`AWS_ACCESS_KEY_ID`, etc.). The S3 client is built as `session.client("s3", endpoint_url=...)`.

```python
import boto3
from distributed_sqlite.engine import bootstrap, create_engine

session = boto3.Session(
    aws_access_key_id="...",
    aws_secret_access_key="...",
    aws_session_token="...",  # from STS
    region_name="us-east-1",
)
bootstrap("my-bucket", "mydb", boto3_session=session)
engine = create_engine(
    "distributed_sqlite+distributed_sqlite:///my-bucket/mydb",
    endpoint_url=None,  # or your LocalStack / MinIO URL
    boto3_session=session,
)
```

The same `boto3_session=` argument works on `open_connection()`, `bootstrap()`, `recovery_scan()`, and `S3Backend(...)`.

If you use `sqlalchemy.create_engine` directly, pass the session in `connect_args`:

```python
sa.create_engine(
    "distributed_sqlite+distributed_sqlite:///my-bucket/mydb",
    connect_args={"boto3_session": session, "endpoint_url": "http://localhost:4566"},
)
```

**Long-lived processes:** botocore refreshes credentials automatically when the session uses a refreshable provider (e.g. `AssumeRole`). If you hold temporary static keys (`GetSessionToken`) until expiry, obtain a new session before the expiry time and open a new engine/connection with it; cached clients on an existing `S3Backend` do not pick up swapped credentials.

## Environment Variables

| Variable | Default | Description |
|---|---|---|
| `AWS_ACCESS_KEY_ID` | — | AWS credentials |
| `AWS_SECRET_ACCESS_KEY` | — | AWS credentials |
| `AWS_DEFAULT_REGION` | `us-east-1` | AWS region |
| `AWS_ENDPOINT_URL` | — | Custom endpoint (LocalStack, MinIO) |
| `DISTRIBUTED_SQLITE_CACHE_DIR` | `~/.distributed_sqlite/cache` | Local cache directory |
| `DISTRIBUTED_SQLITE_CHECKPOINT_INTERVAL` | `50` | Delta segments between checkpoints |
| `DISTRIBUTED_SQLITE_MAX_RETRIES` | `10` | Max commit retry attempts |
| `DISTRIBUTED_SQLITE_RETRY_BASE_SECONDS` | `0.05` | Backoff base delay |
| `DISTRIBUTED_SQLITE_RETRY_MAX_SECONDS` | `30.0` | Max backoff delay |

## Architecture

See [docs/architecture.md](docs/architecture.md) for the full design narrative.

## Development

```bash
cp .env.example .env   # fill in your AWS credentials
uv sync
uv run pytest tests/ -v
```
