Metadata-Version: 2.4
Name: Flowfile
Version: 0.6.3
Summary: Project combining flowfile core (backend) and flowfile_worker (compute offloader) and flowfile_frame (api)
License-File: LICENSE
Author: Edward van Eechoud
Author-email: evaneechoud@gmail.com
Requires-Python: >=3.10,<3.14
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: XlsxWriter (>=3.2.0,<3.3.0)
Requires-Dist: aiofiles (>=24.1.0,<25.0.0)
Requires-Dist: bcrypt (>=4.3.0,<5.0.0)
Requires-Dist: boto3 (>=1.38.40,<1.38.47)
Requires-Dist: connectorx (>=0.4.2,<0.5.0)
Requires-Dist: cryptography (>=45.0.5,<46.0.0)
Requires-Dist: databases (>=0.9.0,<0.10.0)
Requires-Dist: deltalake (>=1.1.0,<2.0.0)
Requires-Dist: faker (>=23.1.0,<23.2.0)
Requires-Dist: fastapi (>=0.115.2,<0.116.0)
Requires-Dist: fastexcel (>=0.12.0,<0.13.0)
Requires-Dist: httpx (>=0.28.1,<0.29.0)
Requires-Dist: loky (>=3.4.1,<3.5.0)
Requires-Dist: methodtools (>=0.4.7,<0.5.0)
Requires-Dist: openpyxl (>=3.1.2,<3.2.0)
Requires-Dist: passlib (>=1.7.4,<1.8.0)
Requires-Dist: pendulum (==2.1.2) ; python_version < "3.12"
Requires-Dist: pl-fuzzy-frame-match (>=0.4.0)
Requires-Dist: polars (>=1.8.2,<1.32) ; sys_platform != "win32"
Requires-Dist: polars (>=1.8.2,<=1.25.2) ; sys_platform == "win32"
Requires-Dist: polars-distance (>=0.4.3,<0.5.0)
Requires-Dist: polars-ds (>=0.6.0)
Requires-Dist: polars-expr-transformer (>=0.5.0)
Requires-Dist: polars-grouper (>=0.3.0,<0.4.0)
Requires-Dist: polars_simed (>=0.3.4,<0.4.0)
Requires-Dist: pyarrow (>=18.0.0,<19.0.0)
Requires-Dist: pydantic (>=2.9.2,<2.10.0)
Requires-Dist: pyiceberg[hadoop] (>=0.9.1,<0.10.0)
Requires-Dist: python-jose (>=3.4.0,<4.0.0)
Requires-Dist: python-multipart (>=0.0.12,<0.1.0)
Requires-Dist: pyyaml (>=6.0.3,<7.0.0)
Requires-Dist: s3fs (>=2025.7.0,<2026.0.0)
Requires-Dist: tqdm (>=4.67.1,<5.0.0)
Requires-Dist: uvicorn (>=0.32.0,<0.33.0)
Requires-Dist: websockets (>=12.0)
Description-Content-Type: text/markdown

<h1 align="center">
  <img src="https://raw.githubusercontent.com/Edwardvaneechoud/Flowfile/main/.github/images/logo.png" alt="Flowfile Logo" width="100">
  <br>
  Flowfile
</h1>

<p align="center">
  <b>Main Repository</b>: <a href="https://github.com/Edwardvaneechoud/Flowfile">Edwardvaneechoud/Flowfile</a><br>
  <b>Documentation</b>: 
  <a href="https://edwardvaneechoud.github.io/Flowfile/">Website</a> - 
  <a href="https://github.com/Edwardvaneechoud/Flowfile/blob/main/flowfile_core/README.md">Core</a> - 
  <a href="https://github.com/Edwardvaneechoud/Flowfile/blob/main/flowfile_worker/README.md">Worker</a> - 
  <a href="https://github.com/Edwardvaneechoud/Flowfile/blob/main/flowfile_frontend/README.md">Frontend</a> - 
  <a href="https://dev.to/edwardvaneechoud/building-flowfile-architecting-a-visual-etl-tool-with-polars-576c">Technical Architecture</a>
</p>

<p>
Flowfile is a visual ETL tool and Python library suite that combines drag-and-drop workflow building with the speed of Polars dataframes. Build data pipelines visually, transform data using powerful nodes, or define data flows programmatically with Python and analyze results - all with high-performance data processing. Export your visual flows as standalone Python/Polars code for production deployment.
</p>

## 🚀 Getting Started

### Installation

Install Flowfile directly from PyPI:

```bash
pip install Flowfile
```

### Quick Start: Web UI

The easiest way to get started is by launching the web-based UI:

```bash
# Start the Flowfile web UI with integrated services
flowfile run ui
```

This will:
- Start the combined core and worker services
- Launch a web interface in your browser
- Provide access to the full visual ETL capabilities

**Options:**
```bash
# Customize host
flowfile run ui --host 0.0.0.0

# Start without opening a browser
flowfile run ui --no-browser
```

You can also start the web UI programmatically:

```python
import flowfile

# Start with default settings
flowfile.start_web_ui()

# Or customize
flowfile.start_web_ui(open_browser=False)
```

### Using the FlowFrame API

Flowfile provides a Polars-like API for defining data pipelines programmatically:

```python
import flowfile as ff
from flowfile import col, open_graph_in_editor

# Create a data pipeline
df = ff.from_dict({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"],
    "value": [100, 200, 150, 300, 250]
})

# Process the data
result = df.filter(col("value") > 150).with_columns([
    (col("value") * 2).alias("double_value")
])

# Open the graph in the web UI (starts the server if needed)
open_graph_in_editor(result.flow_graph)
```

## 📦 Package Components

The `Flowfile` PyPI package includes:

- **Core Service (`flowfile_core`)**: The main ETL engine using Polars
- **Worker Service (`flowfile_worker`)**: Handles computation-intensive tasks
- **Web UI**: Browser-based visual ETL interface
- **FlowFrame API (`flowfile_frame`)**: Polars-like API for Python coding

## ✨ Key Features

### Visual ETL with Web UI

- **No Installation Required**: Launch directly from the pip package
- **Drag-and-Drop Interface**: Build data pipelines visually
- **Integrated Services**: Combined core and worker services
- **Browser-Based**: Access from any device on your network
- **Code Generation**: Export visual flows as Python/Polars scripts

### FlowFrame API

- **Familiar Syntax**: Polars-like API makes it easy to learn
- **ETL Graph Generation**: Automatically builds visual workflows
- **Lazy Evaluation**: Operations are not executed until needed
- **Interoperability**: Move between code and visual interfaces

### Data Operations

- **Data Cleaning & Transformation**: Complex joins, filtering, etc.
- **High Performance**: Built on Polars for efficient processing
- **Data Integration**: Handle various file formats
- **ETL Pipeline Building**: Create reusable workflows

## 🔄 Common FlowFrame Operations

```python

import flowfile as ff
from flowfile import col, when, lit

# Read data
df = ff.from_dict({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"],
    "value": [100, 200, 150, 300, 250]
})
# df_parquet = ff.read_parquet("data.parquet")
# df_csv = ff.read_csv("data.csv")

other_df = ff.from_dict({
    "product_id": [1, 2, 3, 4, 6],
    "product_name": ["WidgetA", "WidgetB", "WidgetC", "WidgetD", "WidgetE"],
    "supplier": ["SupplierX", "SupplierY", "SupplierX", "SupplierZ", "SupplierY"]
}, flow_graph=df.flow_graph  # Assign the data to the same graph
)

# Filter
filtered = df.filter(col("value") > 150)

# Transform
result = df.select(
    col("id"),
    (col("value") * 2).alias("double_value")
)

# Conditional logic
with_status = df.with_columns([
    when(col("value") > 200).then(lit("High")).otherwise(lit("Low")).alias("status")
])

# Group and aggregate
by_category = df.group_by("category").agg([
    col("value").sum().alias("total"),
    col("value").mean().alias("average")
])

# Join data
joined = df.join(other_df, left_on="id", right_on="product_id")

joined.flow_graph.flow_settings.execution_location = "auto"
joined.flow_graph.flow_settings.execution_mode = "Development"
ff.open_graph_in_editor(joined.flow_graph)  # opens the graph in the UI!

```

## 📝 Code Generation

Export your visual flows as standalone Python/Polars code for production use:

![Code Generation](https://raw.githubusercontent.com/Edwardvaneechoud/Flowfile/refs/heads/main/.github/images/generated_code.png)

Simply click the "Generate code" button in the visual editor to:
- Generate clean, readable Python/Polars code
- Export flows without Flowfile dependencies
- Deploy workflows in any Python environment
- Share ETL logic with team members

## 🧰 Command-Line Interface

```bash
# Show help and version info
flowfile

# Start the web UI
flowfile run ui [options]

# Run individual services
flowfile run core --host 0.0.0.0 --port 63578
flowfile run worker --host 0.0.0.0 --port 63579
```

## 📚 Resources

- **[Main Repository](https://github.com/Edwardvaneechoud/Flowfile)**: Latest code and examples
- **[Documentation](https://edwardvaneechoud.github.io/Flowfile/)**: Comprehensive guides
- **[Technical Architecture](https://dev.to/edwardvaneechoud/building-flowfile-architecting-a-visual-etl-tool-with-polars-576c)**: Design overview

## 🖥️ Full Application Options

For the complete visual ETL experience, you have additional options:

- **Desktop Application**: Download from the [main repository](https://github.com/Edwardvaneechoud/Flowfile#-getting-started)
- **Docker Setup**: Run with Docker Compose
- **Manual Setup**: For development environments

## 📋 Development Roadmap

See the [main repository](https://github.com/Edwardvaneechoud/Flowfile#-todo) for the latest development roadmap and TODO list.
