Metadata-Version: 2.1
Name: recap-core
Version: 0.9.4
Summary: Recap reads and writes schemas from web services, databases, and schema registries in a standard format
Keywords: avro data data catalog data discovery data engineering data governance data infrastructure data integration data pipelines data quality devops devtools etl infrastructure json schema metadata protobuf
Home-page: https://recap.build
Author-Email: Chris Riccomini <criccomini@apache.org>
License: MIT
Project-URL: Documentation, https://recap.build
Project-URL: Homepage, https://recap.build
Project-URL: Repository, https://github.com/recap-cloud/recap
Requires-Python: <=3.11,>=3.10
Provides-Extra: kafka
Provides-Extra: proto
Provides-Extra: hive
Provides-Extra: bigquery
Provides-Extra: json
Provides-Extra: app
Provides-Extra: all
Requires-Dist: confluent-kafka[schema-registry]>=2.1.1; extra == "kafka"
Requires-Dist: proto-schema-parser>=0.2.0; extra == "proto"
Requires-Dist: pymetastore>=0.2.0; extra == "hive"
Requires-Dist: google-cloud-bigquery>=3.11.3; extra == "bigquery"
Requires-Dist: referencing>=0.30.0; extra == "json"
Requires-Dist: httpx>=0.24.1; extra == "json"
Requires-Dist: fastapi>=0.103.0; extra == "app"
Requires-Dist: pydantic>=2.3.0; extra == "app"
Requires-Dist: pydantic-settings>=2.0.3; extra == "app"
Requires-Dist: typer>=0.9.0; extra == "app"
Requires-Dist: uvicorn>=0.23.2; extra == "app"
Requires-Dist: rich>=13.5.2; extra == "app"
Requires-Dist: python-dotenv>=1.0.0; extra == "app"
Requires-Dist: fsspec>=2023.9.2; extra == "app"
Requires-Dist: recap-core[app,bigquery,hive,json,kafka,proto]; extra == "all"
Description-Content-Type: text/markdown

<div align="center">
  <img src="https://github.com/recap-build/recap/blob/main/static/recap-logo.png?raw=true" alt="recap">
</div>

## What is Recap?

Recap reads and writes schemas from web services, databases, and schema registries in a standard format.

⭐️ _If you like this project, please give it a star! It helps the project get more visibility._

## Table of Contents

* [What is Recap?](#what-is-recap)
* [Supported Formats](#supported-formats)
* [Install](#install)
* [Usage](#usage)
   * [CLI](#cli)
   * [Gateway](#gateway)
   * [Registry](#registry)
   * [API](#api)
   * [Docker](#docker)
* [Schema](#schema)
* [Documentation](#documentation)

## Supported Formats

| Format      | Read | Write |
| :---------- | :-: | :-: |
| [Avro](https://recap.build/docs/converters/avro/) | ✅ | ✅ |
| [Protobuf](https://recap.build/docs/converters/protobuf/) | ✅ | ✅ |
| [JSON Schema](https://recap.build/docs/converters/json-schema/) | ✅ | ✅ |
| [Snowflake](https://recap.build/docs/readers/snowflake/) | ✅ |  |
| [PostgreSQL](https://recap.build/docs/readers/postgresql/) | ✅ |  |
| [MySQL](https://recap.build/docs/readers/mysql/) | ✅ |  |
| [BigQuery](https://recap.build/docs/readers/bigquery/) | ✅ |  |
| [Confluent Schema Registry](https://recap.build/docs/readers/confluent-schema-registry/) | ✅ |  |
| [Hive Metastore](https://recap.build/docs/readers/hive-metastore/) | ✅ |  |

## Install

Install Recap and all of its optional dependencies:

```bash
pip install 'recap-core[all]'
```

You can also select specific dependencies:

```bash
pip install 'recap-core[avro,kafka]'
```

See `pyproject.toml` for a list of optional dependencies.

## Usage

### CLI

Recap comes with a command line interface that can list and read schemas from external systems.

List the children of a URL:

```bash
recap ls postgresql://user:pass@host:port/testdb
```

```json
[
  "pg_toast",
  "pg_catalog",
  "public",
  "information_schema"
]
```

Keep drilling down:

```bash
recap ls postgresql://user:pass@host:port/testdb/public
```

```json
[
  "test_types"
]
```

Read the schema for the `test_types` table as a Recap struct:

```bash
recap schema postgresql://user:pass@host:port/testdb/public/test_types
```

```json
{
  "type": "struct",
  "fields": [
    {
      "type": "int64",
      "name": "test_bigint",
      "optional": true
    }
  ]
}
```

### Gateway

Recap comes with a stateless HTTP/JSON gateway that can list and read schemas.

Start the server at [http://localhost:8000](http://localhost:8000):

```bash
recap serve
```

List the schemas in a PostgreSQL database:

```bash
curl http://localhost:8000/gateway/ls/postgresql://user:pass@host:port/testdb
```

```json
["pg_toast","pg_catalog","public","information_schema"]
```

And read a schema:

```bash
curl http://localhost:8000/gateway/schema/postgresql://user:pass@host:port/testdb/public/test_types
```

```json
{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}
```

The gateway fetches schemas from external systems in realtime and returns them as Recap schemas.

An OpenAPI schema is available at [http://localhost:8000/docs](http://localhost:8000/docs).

### Registry

You can store schemas in Recap's schema registry.

Start the server at [http://localhost:8000](http://localhost:8000):

```bash
recap serve
```

Put a schema in the registry:

```bash
curl -X POST \
    -H "Content-Type: application/x-recap+json" \
    -d '{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}' \
    http://localhost:8000/registry/some_schema
```

Get the schema (and version) from the registry:

```bash
curl http://localhost:8000/registry/some_schema
```

```json
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]
```

Put a new version of the schema in the registry:

```bash
curl -X POST \
    -H "Content-Type: application/x-recap+json" \
    -d '{"type":"struct","fields":[{"type":"int32","name":"test_int","optional":true}]}' \
    http://localhost:8000/registry/some_schema
```

List schema versions:

```bash
curl http://localhost:8000/registry/some_schema/versions
```

```json
[1,2]
```

Get a specific version of the schema:

```bash
curl http://localhost:8000/registry/some_schema/versions/1
```

```json
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]
```

The registry uses [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) to store schemas in a variety of filesystems like S3, GCS, ABS, and the local filesystem. See the [registry](https://recap.build/docs/registry/) docs for more details.

An OpenAPI schema is available at [http://localhost:8000/docs](http://localhost:8000/docs).

### API

Recap has `recap.converters` and `recap.clients` packages.

- Converters convert schemas to and from Recap schemas.
- Clients read schemas from external systems (databases, schema registries, and so on) and use converters to return Recap schemas.

Read a schema from PostgreSQL:

```python
from recap.clients import create_client

with create_client("postgresql://user:pass@host:port/testdb") as c:
    c.schema("testdb", "public", "test_types")
```

Convert the schema to Avro, Protobuf, and JSON schemas:

```python
from recap.converters.avro import AvroConverter
from recap.converters.protobuf import ProtobufConverter
from recap.converters.json_schema import JSONSchemaConverter

avro_schema = AvroConverter().from_recap(struct)
protobuf_schema = ProtobufConverter().from_recap(struct)
json_schema = JSONSchemaConverter().from_recap(struct)
```

Transpile schemas from one format to another:

```python
from recap.converters.json_schema import JSONSchemaConverter
from recap.converters.avro import AvroConverter

json_schema = """
{
    "type": "object",
    "$id": "https://recap.build/person.schema.json",
    "properties": {
        "name": {"type": "string"}
    }
}
"""

# Use Recap as an intermediate format to convert JSON schema to Avro
struct = JSONSchemaConverter().to_recap(json_schema)
avro_schema = AvroConverter().from_recap(struct)
```

Store schemas in Recap's schema registry:

```python
from recap.storage.registry import RegistryStorage
from recap.types import StructType, IntType

storage = RegistryStorage("file:///tmp/recap-registry-storage")
version = storage.put(
    "postgresql://localhost:5432/testdb/public/test_table",
    StructType(fields=[IntType(32)])
)
storage.get("postgresql://localhost:5432/testdb/public/test_table")

# Get all versions of a schema
versions = storage.versions("postgresql://localhost:5432/testdb/public/test_table")

# List all schemas in the registry
schemas = storage.ls()
```

### Docker

Recap's gateway and registry are also available as a Docker image:

```bash
docker run \
    -p 8000:8000 \
    -e RECAP_URLS=["postgresql://user:pass@localhost:5432/testdb"]' \
    ghcr.io/recap-build/recap:latest
```

See [Recap's Docker documentation](https://recap.build/docs/gateway/docker) for more details.

## Schema

See [Recap's type spec](https://recap.build/specs/type) for details on Recap's type system.

## Documentation

Recap's documentation is available at [recap.build](https://recap.build).
