Metadata-Version: 2.3
Name: bizon
Version: 0.1.2
Summary: Extract and load your data reliably from API Clients with native fault-tolerant and checkpointing mechanism.
Author: Antoine Balliet
Author-email: antoine.balliet@gmail.com
Requires-Python: >=3.9,<3.13
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Provides-Extra: bigquery
Provides-Extra: datadog
Provides-Extra: gsheets
Provides-Extra: kafka
Provides-Extra: postgres
Provides-Extra: rabbitmq
Requires-Dist: avro (>=1.12.0,<2.0.0) ; extra == "kafka"
Requires-Dist: backoff (>=2.2.1,<3.0.0)
Requires-Dist: click (>=8.1.7,<9.0.0)
Requires-Dist: confluent-kafka (>=2.6.0,<3.0.0) ; extra == "kafka"
Requires-Dist: datadog (>=0.50.2,<0.51.0) ; extra == "datadog"
Requires-Dist: ddtrace (>=3.10.0,<4.0.0) ; extra == "datadog"
Requires-Dist: dpath (>=2.2.0,<3.0.0)
Requires-Dist: fastavro (>=1.9.7,<2.0.0) ; extra == "kafka"
Requires-Dist: google-cloud-bigquery (>=3.25.0,<4.0.0) ; extra == "bigquery"
Requires-Dist: google-cloud-bigquery-storage (>=2.25.0,<3.0.0) ; extra == "bigquery"
Requires-Dist: google-cloud-storage (>=2.17.0,<3.0.0)
Requires-Dist: gspread (>=6.1.2,<7.0.0) ; extra == "gsheets"
Requires-Dist: kafka-python (>=2.0.2,<3.0.0) ; extra == "kafka"
Requires-Dist: loguru (>=0.7.2,<0.8.0)
Requires-Dist: orjson (>=3.10.16,<4.0.0)
Requires-Dist: pendulum (>=3.0.0,<4.0.0)
Requires-Dist: pika (>=1.3.2,<2.0.0) ; extra == "rabbitmq"
Requires-Dist: polars (>=1.16.0,<2.0.0)
Requires-Dist: protobuf (>=4.24.0,<5.0.0) ; extra == "bigquery"
Requires-Dist: psycopg2-binary (>=2.9.9,<3.0.0) ; extra == "postgres"
Requires-Dist: pyarrow (>=16.1.0,<17.0.0)
Requires-Dist: pydantic (>=2.8.2,<3.0.0)
Requires-Dist: pydantic-extra-types (>=2.9.0,<3.0.0)
Requires-Dist: python-dotenv (>=1.0.1,<2.0.0)
Requires-Dist: pytz (>=2024.2,<2025.0)
Requires-Dist: pyyaml (>=6.0.1,<7.0.0)
Requires-Dist: requests (>=2.28.2,<3.0.0)
Requires-Dist: simplejson (>=3.20.1,<4.0.0)
Requires-Dist: sqlalchemy (>=2.0.32,<3.0.0)
Requires-Dist: sqlalchemy-bigquery (>=1.11.0,<2.0.0) ; extra == "bigquery"
Requires-Dist: tenacity (>=9.0.0,<10.0.0)
Description-Content-Type: text/markdown

# bizon ⚡️
Extract and load your largest data streams with a framework you can trust for billion records.

## Features
- **Natively fault-tolerant**: Bizon uses a checkpointing mechanism to keep track of the progress and recover from the last checkpoint.

- **High throughput**: Bizon is designed to handle high throughput and can process billions of records.

- **Queue system agnostic**: Bizon is agnostic of the queuing system, you can use any queuing system among Python Queue, RabbitMQ, Kafka or Redpanda. Thanks to the `bizon.engine.queue.Queue` interface, adapters can be written for any queuing system.

- **Pipeline metrics**: Bizon provides exhaustive pipeline metrics and implement Datadog & OpenTelemetry for tracing. You can monitor:
    - ETAs for completion
    - Number of records processed
    - Completion percentage
    - Latency Source <> Destination

- **Lightweight & lean**: Bizon is lightweight, minimal codebase and only uses few dependencies:
    - `requests` for HTTP requests
    - `pyyaml` for configuration
    - `sqlalchemy` for database / warehouse connections
    - `polars` for memory efficient data buffering and vectorized processing
    - `pyarrow` for Parquet file format

## Installation
```bash
pip install bizon
```

## Usage

### List available sources and streams
```bash
bizon source list
bizon stream list <source_name>
```

### Create a pipeline

Create a file named `config.yml` in your working directory with the following content:

```yaml
name: demo-creatures-pipeline

source:
  name: dummy
  stream: creatures
  authentication:
    type: api_key
    params:
      token: dummy_key

destination:
  name: logger
  config:
    dummy: dummy
```

Run the pipeline with the following command:

```bash
bizon run config.yml
```
## Backend configuration

Backend is the interface used by Bizon to store its state. It can be configured in the `backend` section of the configuration file. The following backends are supported:
- `sqlite`: In-memory SQLite database, useful for testing and development.
- `bigquery`: Google BigQuery backend, perfect for light setup & production.
- `postgres`: PostgreSQL backend, for production use and frequent cursor updates.

## Queue configuration

Queue is the interface used by Bizon to exchange data between `Source` and `Destination`. It can be configured in the `queue` section of the configuration file. The following queues are supported:
- `python_queue`: Python Queue, useful for testing and development.
- `rabbitmq`: RabbitMQ, for production use and high throughput.
- `kafka`: Apache Kafka, for production use and high throughput and strong persistence.

## Runner configuration

Runner is the interface used by Bizon to run the pipeline. It can be configured in the `runner` section of the configuration file. The following runners are supported:
- `thread` (asynchronous)
- `process` (asynchronous)
- `stream` (synchronous)

## Start syncing your data 🚀

### Quick setup without any dependencies ✌️

Queue configuration can be set to `python_queue` and backend configuration to `sqlite`.
This will allow you to test the pipeline without any external dependencies.


### Local Kafka setup

To test the pipeline with Kafka, you can use `docker compose` to setup Kafka or Redpanda locally.

**Kafka**
```bash
docker compose --file ./scripts/kafka-compose.yml up # Kafka
docker compose --file ./scripts/redpanda-compose.yml up # Redpanda
```

In your YAML configuration, set the `queue` configuration to Kafka under `engine`:
```yaml
engine:
  queue:
    type: kafka
    config:
      queue:
        bootstrap_server: localhost:9092 # Kafka:9092 & Redpanda: 19092
```

**RabbitMQ**
```bash
docker compose --file ./scripts/rabbitmq-compose.yml up
```

In your YAML configuration, set the `queue` configuration to Kafka under `engine`:

```yaml
engine:
  queue:
    type: rabbitmq
    config:
      queue:
        host: localhost
        queue_name: bizon
```

