Metadata-Version: 2.4
Name: celine-utils
Version: 1.7.0
Summary: CELINE utils
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: bcrypt>=4.3.0
Requires-Dist: openlineage-python>=1.37.0
Requires-Dist: paho-mqtt>=2.1.0
Requires-Dist: pandas>=2.3.2
Requires-Dist: psycopg2-binary>=2.9.10
Requires-Dist: pydantic>=2.11.7
Requires-Dist: pydantic-settings>=2.10.1
Requires-Dist: python-dotenv>=1.1.1
Requires-Dist: python-keycloak>=5.8.1
Requires-Dist: requests>=2.32.5
Requires-Dist: sqlalchemy>=2.0.43
Requires-Dist: typer>=0.16.1
Requires-Dist: dbt-core>=1.10.10
Requires-Dist: dbt-postgres>=1.9.1
Requires-Dist: prefect-dbt>=0.7.6
Requires-Dist: meltano>=3.9.1
Requires-Dist: prefect>=3.4.19
Requires-Dist: celine-sdk>=1.1.0
Dynamic: license-file

# CELINE Utils

**CELINE Utils** is a collection of shared utilities, libraries, and command-line tools that form the technical backbone of the **CELINE data platform**.

It provides reusable building blocks for data pipelines, governance, lineage, metadata management, and platform integrations. The repository is designed to be embedded into CELINE applications and executed within orchestrated environments using Meltano, dbt, Prefect, and OpenLineage

---

## Scope and Goals

The goals of this repository are to:

- Centralize **cross-cutting platform logic** used by multiple CELINE projects
- Provide **opinionated but extensible** tooling for data pipelines
- Enforce **consistent governance and lineage semantics**
- Reduce duplication across pipeline applications
- Act as a stable foundation for CELINE-compatible services and workflows

This is not an end-user application; it is a **platform utility layer**.

---

## Key Capabilities

### Command Line Interface (CLI)

A unified CLI built with Typer exposes administrative, governance, and pipeline utilities:

```text
celine-utils
 ├── governance
 │    └── generate
 └── pipeline
      ├── init
      └── run
```

---

### Pipeline Orchestration

CELINE Utils provides a structured execution layer for:

- **Meltano** ingestion pipelines
- **dbt** transformations and tests
- **Prefect**-based Python flows

The `PipelineRunner` coordinates execution, logging, error handling, and lineage emission in a consistent way across tools.

See the [pipeline tutorial](docs/pipeline-tutorial.md) to discover how to setup and deploy a new pipeline.

---

### OpenLineage Integration

First-class OpenLineage support includes:

- Automatic emission of START, COMPLETE, FAIL, and ABORT events
- Dataset-level schema facets
- Data quality assertions from dbt tests
- Custom CELINE governance facets

---

### Governance Framework

A declarative `governance.yaml` specification allows you to define:

- Dataset ownership
- License and access level
- Classification and retention
- Tags and documentation links

Governance rules are resolved using pattern matching and injected into lineage events.

---

### Dataset Tooling

The `DatasetClient` enables:

- Schema and table introspection
- Column metadata inspection
- Safe query construction
- Export to Pandas

---

### Platform Integrations

Built-in integrations include:

- **Keycloak** for identity and access management
- **Apache Superset** for analytics platform integration
- **MQTT** for lightweight messaging

---

## Repository Structure

```text
celine/
  admin/
  cli/
  common/
  datasets/
  pipelines/
schemas/
tests/
```

---

## Configuration

Configuration is environment-driven using `pydantic-settings`:

- Environment variables first
- Optional `.env` files
- Typed validation
- Container-friendly defaults

---

## Installation

```bash
pip install celine-utils
```

---

## Intended Audience

CELINE Utils is intended for:

- Data engineers
- Platform engineers
- CELINE application developers

It is not a general-purpose data tooling library.

---

## License

Copyright © 2025  
Spindox Labs

Licensed under the Apache License, Version 2.0.
