Metadata-Version: 2.4
Name: iceframe
Version: 0.6.0
Summary: A DataFrame-like library for working with Apache Iceberg tables
Author-email: Alex Merced <alex@alexmerced.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/alexmerced/iceframe
Project-URL: Documentation, https://github.com/alexmerced/iceframe/tree/main/docs
Project-URL: Repository, https://github.com/alexmerced/iceframe
Project-URL: Issues, https://github.com/alexmerced/iceframe/issues
Keywords: iceberg,dataframe,data engineering,apache iceberg
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: pyiceberg>=0.6.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: polars>=0.19.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pandas>=2.0.0
Provides-Extra: aws
Requires-Dist: s3fs>=2023.1.0; extra == "aws"
Requires-Dist: boto3>=1.28.0; extra == "aws"
Provides-Extra: gcs
Requires-Dist: gcsfs>=2023.1.0; extra == "gcs"
Provides-Extra: azure
Requires-Dist: adlfs>=2023.1.0; extra == "azure"
Provides-Extra: all
Requires-Dist: iceframe[aws,azure,cli,gcs]; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.0.270; extra == "dev"
Requires-Dist: mypy>=1.3.0; extra == "dev"
Requires-Dist: python-dotenv>=1.0.0; extra == "dev"
Provides-Extra: pdf
Requires-Dist: fpdf2>=2.7.0; extra == "pdf"
Requires-Dist: markdown-it-py>=3.0.0; extra == "pdf"
Provides-Extra: delta
Requires-Dist: deltalake>=0.15.0; extra == "delta"
Provides-Extra: lance
Requires-Dist: pylance>=0.9.0; extra == "lance"
Provides-Extra: vortex
Requires-Dist: vortex-data>=0.1.0; extra == "vortex"
Provides-Extra: excel
Requires-Dist: fastexcel>=0.9.0; extra == "excel"
Provides-Extra: gsheets
Requires-Dist: gspread>=6.0.0; extra == "gsheets"
Provides-Extra: hudi
Requires-Dist: getdaft>=0.2.0; extra == "hudi"
Provides-Extra: mcp
Requires-Dist: mcp>=0.1.0; extra == "mcp"
Provides-Extra: ingestion
Requires-Dist: iceframe[delta,excel,gsheets,hudi,lance,vortex]; extra == "ingestion"
Provides-Extra: cli
Requires-Dist: typer>=0.9.0; extra == "cli"
Requires-Dist: rich>=13.0.0; extra == "cli"
Provides-Extra: agent
Requires-Dist: openai>=1.0.0; extra == "agent"
Requires-Dist: anthropic>=0.18.0; extra == "agent"
Requires-Dist: google-generativeai>=0.3.0; extra == "agent"
Requires-Dist: rich>=13.0.0; extra == "agent"
Provides-Extra: cache
Requires-Dist: diskcache>=5.6.0; extra == "cache"
Provides-Extra: streaming
Requires-Dist: kafka-python>=2.0.0; extra == "streaming"
Provides-Extra: monitoring
Requires-Dist: psutil>=5.9.0; extra == "monitoring"
Requires-Dist: prometheus-client>=0.19.0; extra == "monitoring"
Provides-Extra: notebook
Requires-Dist: ipython>=8.0.0; extra == "notebook"
Requires-Dist: ipywidgets>=8.0.0; extra == "notebook"
Provides-Extra: pydantic
Requires-Dist: pydantic>=2.0.0; extra == "pydantic"
Provides-Extra: sql
Requires-Dist: connectorx>=0.3.0; extra == "sql"
Requires-Dist: sqlalchemy>=2.0.0; extra == "sql"
Provides-Extra: xml
Requires-Dist: lxml>=4.9.0; extra == "xml"
Provides-Extra: stats
Requires-Dist: pyreadstat>=1.2.0; extra == "stats"

# IceFrame (Alpha)

A DataFrame-like library for working with Apache Iceberg tables using REST catalogs with local execution.

IceFrame provides a simple, intuitive API for creating, reading, updating, and deleting Iceberg tables, as well as performing maintenance operations and exporting data.

## Features

- **DataFrame API**: Familiar interface for working with tables
- **Local Execution**: Uses PyIceberg, PyArrow, and Polars for efficient local processing
- **Catalog Support**: Works with REST catalogs (including Dremio, Tabular, etc.) and supports credential vending
- **CRUD Operations**: Create, Read, Update, Delete tables and data
- **Maintenance**: Expire snapshots, remove orphan files, compact data files
- **Export**: Export data to Parquet, CSV, and JSON

## Installation

```bash
pip install iceframe
```

For cloud storage support:

```bash
pip install "iceframe[aws]"   # AWS S3
pip install "iceframe[gcs]"   # Google Cloud Storage
pip install "iceframe[azure]" # Azure Data Lake Storage
```

## Quick Start

1. Create a `.env` file with your catalog credentials (see `.env.example`):

```env
ICEBERG_CATALOG_URI=https://catalog.dremio.cloud/api/iceberg
ICEBERG_TOKEN=your_token
ICEBERG_WAREHOUSE=your_warehouse
ICEBERG_CATALOG_TYPE=rest
```

2. Use IceFrame in your code:

```python
from iceframe import IceFrame
from iceframe.utils import load_catalog_config_from_env
import polars as pl

# Initialize
config = load_catalog_config_from_env()
ice = IceFrame(config)

# Create a table
schema = {
    "id": "long",
    "name": "string",
    "created_at": "timestamp"
}
ice.create_table("my_table", schema)

# Append data
data = pl.DataFrame({
    "id": [1, 2],
    "name": ["Alice", "Bob"],
    "created_at": [pl.datetime(2024, 1, 1), pl.datetime(2024, 1, 2)]
})
ice.append_to_table("my_table", data)

# Read data
df = ice.read_table("my_table")
print(df)

# Query Builder API
from iceframe.expressions import col
from iceframe.functions import sum

df = (ice.query("my_table")
      .select("name", sum(col("id")).alias("total_id"))
      .group_by("name")
      .execute())
print(df)
```


## Feature Comparison: IceFrame vs PyIceberg

IceFrame builds on top of PyIceberg, adding high-level abstractions and missing features.

| Feature | PyIceberg (Native) | IceFrame (Enhanced) |
| :--- | :--- | :--- |
| **Table CRUD** | Low-level API | Simplified `create_table`, `drop_table` |
| **Data Writing** | Arrow/Pandas integration | Polars integration, Auto-schema inference |
| **Branching** | Basic support (WIP) | `create_branch`, `fast_forward`, WAP Pattern |
| **Compaction** | `rewrite_data_files` (limited) | `bin_pack`, `sort` strategies (Polars-based) |
| **Views** | Catalog-dependent | Unified `ViewManager` abstraction |
| **Maintenance** | `expire_snapshots` | `GarbageCollector`, **Native** `remove_orphan_files` |
| **SQL Support** | None | Fluent Query Builder (`select`, `filter`, `join`) |
| **Ingestion** | `add_files` | `add_files` wrapper + Incremental Ingestion recipes |
| **Rollback** | `manage_snapshots` | `rollback_to_snapshot`, `rollback_to_timestamp` |
| **Async** | None | `AsyncIceFrame` for non-blocking I/O |

## Documentation

- [Architecture](architecture.md)
- [Creating Tables](docs/creating_tables.md)
- [Reading Tables](docs/reading_tables.md)
- [Updating Tables](docs/updating_tables.md)
- [Deleting Tables](docs/deleting_tables.md)
- [Query Builder API](docs/query_builder.md)
- [Namespace Management](docs/namespaces.md)
- [Schema Evolution](docs/schema_evolution.md)
- [Partition Management](docs/partitioning.md)
- [Data Quality](docs/data_quality.md)
- [Table Maintenance](docs/maintenance.md)
- [Exporting Data](docs/export.md)
- [CLI Usage](docs/cli.md)
- [Dependencies](docs/dependencies.md)

### Advanced Features
- [Incremental Processing](docs/incremental.md)
- [Table Statistics](docs/statistics.md)
- [Scalability Features](docs/scalability.md)
- [Advanced Iceberg Features](docs/advanced_features.md)
- [JOIN Support](docs/joins.md)
- [Branching & Tagging](docs/branching.md)
- [Rollback & History](docs/rollback.md)
- [Bulk Ingestion](docs/ingestion.md)
- [Catalog Operations](docs/catalog_ops.md)
- [Native Maintenance](docs/native_maintenance.md)
- [Async Operations](docs/async.md)
- [AI Agent](docs/ai_agent.md)
- [MCP Server](docs/mcp.md)
- [Pydantic Integration](docs/pydantic.md)
- [Notebook Integration](docs/notebooks.md)
- [Data Ingestion](docs/ingest.md)
- [Native File Ingestion](docs/ingest_native.md)
- [Optional File Ingestion](docs/ingest_optional.md)
- [Advanced File Ingestion](docs/ingest_advanced.md)

### Scalability
- [Scalability Overview](docs/scalability.md)

### Recipes & Patterns
- [ETL Pipeline](docs/recipes/etl_pipeline.md) - Simple Extract-Transform-Load workflow
- [SCD Type 2](docs/recipes/scd_type_2.md) - Handling slowly changing dimensions
- [Incremental Ingestion](docs/recipes/incremental_ingestion.md) - Processing only new data
- [Data Quality Gate](docs/recipes/data_quality_gate.md) - Write-Audit-Publish pattern
