Metadata-Version: 2.1
Name: fabricks
Version: 0.9.7.1
Summary: 
Author: BMS DWH Team
Author-email: bi_support@bmsuisse.ch
Requires-Python: >=3.9,<4
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: azure-data-tables (>=12.5.0,<13.0.0)
Requires-Dist: azure-identity (>=1.10.0)
Requires-Dist: azure-storage-blob (>=12.14.1)
Requires-Dist: azure-storage-queue (>=12.10.0,<13.0.0)
Requires-Dist: databricks-sdk (>=0.20.0)
Requires-Dist: jinja2 (>=2.11.3)
Requires-Dist: python-dotenv (>=1.0.1)
Requires-Dist: pyyaml (>=6.0.0)
Requires-Dist: sqlglot (>=22.1.1)
Description-Content-Type: text/markdown

# 🧱 Fabricks: Simplifying **Databricks** Data Pipelines

**Fabricks** (**F**ramework for **Databricks**) is a Python framework designed to streamline the creation of lakehouses in **Databricks**. It offers a standardized approach to defining and managing data processing workflows, making it easier to build and maintain robust data pipelines.

## 🌟 Key Features

- 📄 **YAML Configuration**: Easy-to-modify workflow definitions
- 🔍 **SQL-Based Business Logic**: Familiar and powerful data processing
- 🔄 **Version Control**: Track changes and roll back when needed
- 🔌 **Seamless Data Source Integration**: Effortlessly add new sources
- 📊 **Change Data Capture (CDC)**: Track and handle data changes over time
- 🔧 **Flexible Schema Management**: Drop and create tables as needed

## 🚀 Getting Started

### 📦 Installation

1. Navigate to your **Databricks** workspace
2. Select your target cluster
3. Click on the `Libraries` tab
4. Choose `Install New`
5. Select `PyPI` as the library source
6. Enter `fabricks` in the package text box
7. Click `Install`

Once installed, import **Fabricks** in your notebooks or scripts:

```python
import fabricks
```

## 🏗️ Project Configuration

### 🔧 Runtime Configuration

Define your **Fabricks** runtime in a YAML file. Here's a basic structure:

```yaml
name: MyFabricksProject
options:
  secret_scope: my_secret_scope
  timeout: 3600
  workers: 4
path_options:
  storage: /mnt/data
spark_options:
  sql:
    option1: value1
    option2: value2

# Pipeline Stages
bronze:
  name: Bronze Stage
  path_options:
    storage: /mnt/bronze
  options:
    option1: value1

silver:
  name: Silver Stage
  path_options:
    storage: /mnt/silver
  options:
    option1: value1

gold:
  name: Gold Stage
  path_options:
    storage: /mnt/gold
  options:
    option1: value1
```

## 🥉 Bronze Step

The initial stage for raw data ingestion:

```yaml
- job:
    step: bronze
    topic: sales_data
    item: daily_transactions
    tags: [raw, sales]
    options:
      mode: append
      uri: abfss://fabricks@$datahub/raw/sales
      parser: parquet
      keys: [transaction_id]
      source: pos_system
```

## 🥈 Silver Step

The intermediate stage for data processing:

```yaml
- job:
    step: silver
    topic: sales_analytics
    item: daily_summary
    tags: [processed, sales]
    options:
      mode: update
      change_data_capture: scd1
      parents: [bronze.daily_transactions]
      extender: sales_extender
      check_options:
        max_rows: 1000000
```

## 🥇 Gold Step

The final stage for data consumption:

```yaml
- job:
    step: gold
    topic: sales_reports
    item: monthly_summary
    tags: [report, sales]
    options:
      mode: complete
      change_data_capture: scd2
```

## 📚 Usage

// Detailed usage instructions to be added here

## 📄 License

This project is licensed under the MIT License.

---

For more information, visit [**Fabricks** Documentation](https://fabricks.readthedocs.io) 📚

