Metadata-Version: 2.3
Name: pyquery-polars
Version: 4.1.1
Summary: Enterprise-grade Headless ETL Engine with Interactive UI
Keywords: PyQuery,Polars,ETL,Big Data,Excel,Power BI,Automation,Analytics,Audit
Author: Shan
Author-email: Shan <tksudharshan@gmail.com>
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Information Technology
Classifier: Topic :: Office/Business :: Financial :: Spreadsheet
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: MacOS
Classifier: Natural Language :: English
Requires-Dist: polars>=1.0.0
Requires-Dist: streamlit>=1.30.0
Requires-Dist: fastapi>=0.109.0
Requires-Dist: uvicorn>=0.25.0
Requires-Dist: pydantic>=2.5.0
Requires-Dist: rich>=13.7.0
Requires-Dist: questionary>=2.0.0
Requires-Dist: xlsxwriter>=3.1.0
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: connectorx>=0.3.3
Requires-Dist: fastexcel>=0.16.0
Requires-Dist: python-multipart>=0.0.20
Requires-Dist: matplotlib>=3.9.4
Requires-Dist: seaborn>=0.13.2
Requires-Dist: plotly>=6.5.0
Requires-Dist: statsmodels>=0.14.6
Requires-Dist: scikit-learn>=1.6.1
Requires-Dist: scipy>=1.13.1
Requires-Dist: chardet>=5.2.0
Requires-Dist: sqlalchemy>=2.0.45
Requires-Python: >=3.9
Project-URL: Changelog, https://github.com/tks18/pyquery/releases
Project-URL: Documentation, https://github.com/tks18/pyquery#readme
Project-URL: Homepage, https://github.com/tks18/pyquery
Project-URL: Issues, https://github.com/tks18/pyquery/issues
Project-URL: Repository, https://github.com/tks18/pyquery
Description-Content-Type: text/markdown

<div align="center">

# ⚡ PyQuery: The Main Character of Data Stacks 💫

### _ETL. EDA. ML. SQL. IDE._

<p>
  <a href="#"><img src="https://img.shields.io/badge/Execution-Lazy_Execution_Enabled-6A0DAD?style=for-the-badge" alt="Execution"></a>
  <a href="#"><img src="https://img.shields.io/badge/Mode-Power_User_Ready-8B0000?style=for-the-badge" alt="Mode"></a>
  <a href="#"><img src="https://img.shields.io/badge/Privacy-Local_First_No_Cloud-2E8B57?style=for-the-badge" alt="Privacy"></a>
  <a href="#"><img src="https://img.shields.io/badge/Design-Premium_UX-%238A2BE2?style=for-the-badge" alt="Design"></a>
  <a href="#"><img src="https://img.shields.io/badge/Stack-Full_IDE_Included_💻-007ACC?style=for-the-badge" alt="Stack"></a>
</p>

<p>
  <a href="https://pypi.org/project/pyquery-polars/"><img src="https://img.shields.io/pypi/v/pyquery-polars.svg?color=4CAF50&logo=python&logoColor=white" alt="PyPI Version"></a>
  <a href="https://pypi.org/project/pyquery-polars/"><img src="https://img.shields.io/pypi/pyversions/pyquery-polars.svg?color=blue" alt="Python Versions"></a>
  <a href="LICENSE"><img src="https://img.shields.io/github/license/tks18/pyquery.svg?color=orange" alt="License"></a>
</p>

![Rows of Data](https://i.giphy.com/sRFEa8lbeC7zbcIZZR.webp)

## 🚩 Stop letting Pandas hold you back. <br> The single-threaded era is over.

**PyQuery** is a local-first data operating system that **auto-heals broken CSVs**, includes a **native Code Editor**, and processes **100GB+ files** without breaking a sweat. ⚡

[Feature Request](https://github.com/tks18/pyquery/issues) · [Report Bug](https://github.com/tks18/pyquery/issues)

</div>

---

## 🎮 The Ecosystem (Choose Your Path)

We built a suite of tools so perfect it hurts.

| Path | Vibe | Description | Link |
| :--- | :--- | :--- | :--- |
| **CLI** | 🏎️ **Speedrun** | The **Headless Beast**. Run data pipelines in your sleep. | [**CLI Manual**](docs/CLI.md) |
| **UI** | 🎨 **Creative** | The **Visual Studio**. Drag, drop, analyze, visualize. | [**UI Guide**](docs/UI.md) |
| **API** | 📡 **Backbone** | The **Server**. Build your own apps on our engine. | [**API Docs**](docs/API.md) |
| **SDK** | 🐍 **Sorcery** | The **Python Library**. For the code wizards. | [**SDK Guide**](docs/SDK.md) |

---

## 🧠 TL;DR (For the goldfish attention spans)

> **✨ New Drop: Headless Ghost Mode 👻**
> PyQuery now supports total **Headless Automation**. Run massive pipelines in CI/CD, schedule tasks, and bypass the UI entirely with the re-architected `run` command.

1.  **Install it:** `pip install pyquery-polars` (Don't be basic).
2.  **Run it:** `pyquery ui` (Visuals) or `pyquery run` (Speedrun/Headless).
3.  **The Flex:** It's a local-first, privacy-focused engine that eats Excel sheets and CSVs for breakfast using Rust.

---

## ⛩️ The Awakening (Lore)

Long ago, the Data World was **mid**. Analysts lived in fear of the `MemoryError`. They bowed before the single-threaded tyranny of the Old Gods (Pandas). They accepted their fate of freezing screens, crashing kernels, and waiting 4 hours for a simple `groupby`.

**But I refused.**

From the depths of the Rusty abyss, **PyQuery** has awakened. I am not just an ETL tool anymore. I am the entire war room. I am here to **obliterate** your bottlenecks and **ratio** your old benchmarks.

### The Core Philosophy (Our Ninja Way) 🥷

- **Lazy Execution**: Nothing computes until you say "Export". This optimizes memory and speed so your hardware doesn't scream.
- **Zero-Copy**: Data is processed efficiently without redundant copies. We don't waste bits.
- **Strict & Clean**: Enforces strict typing and argument validation. No ambiguous magic, just pure logic.
- **Automation First**: While the UI is gorgeous, PyQuery is built to run alone in the dark.

### **Welcome to your Villain Arc.** 👹

---

## 🧾 PyQuery vs. Power Query: The Roast

We don't usually punch down, but you handed us the gloves.

| Feature          | ⚡ **PyQuery** (The Chad)                                      | 🐢 **Power Query** (The Virgin)                                             |
| :--------------- | :------------------------------------------------------------- | :-------------------------------------------------------------------------- |
| **Speed**        | **Rust-Powered.** Processes millions of rows before you blink. | **Single-Threaded.** Spends 20 mins saying "Loading Data..." just to crash. |
| **Language**     | **Python/SQL/Polars.** The languages of gods.                  | **M-Code.** A language invented to punish humanity.                         |
| **AI/ML**        | **Built-in.** Random Forests, Clustering, & Monte Carlo Sims.  | **Non-existent.** You need a generic "AI Plugin" that costs extra.          |
| **Vibe**         | **Dark Mode CLI & Streamlit.** Cyberpunk aesthetic.            | **Corporate Grey.** It sucks the soul out of your body.                     |
| **Price**        | **Free & Open Source.**                                        | **Requires an Office 365 License** (Subscription L).                        |
| **Boot XP**      | **Cinematic CLI with Themes & Logs**                           | **Static Spinner of Doom**                                                  |
| **Broken CSVs**  | **Auto-healed at ingest**                                      | **Crashes silently**                                                        |
| **One Bad File** | **Isolated & corrected**                                       | **Pipeline dead**                                                           |
| **Headless**     | **Full CLI Automation.** Designed for CI/CD pipelines.         | **UI Dependent.** Good luck automating that in a Linux shell.               |

---

## 🖥️ The Main Character CLI (The Experience)

This is not a command line.
This is a **startup ritual**.

Every time PyQuery boots, it behaves like a data OS coming online.

### ⚡ Adaptive Theme Engine

The CLI dynamically switches **color gradients, borders, and mood** based on your selected boot mode. Each theme announces itself during startup. You _feel_ it before you run anything.

- **Cyberpunk:** (Default) Neon main-character energy.
- **Rustacean:** Pure Polars lore.
- **Matrix:** Hacker-core, green text supremacy.
- **Villain Arc:** Purple & gold. No mercy.

### 👻 Headless Revamp: The `run` Command

The CLI has been completely re-architected for **Automation Supremacy**. The `run` command is your primary entry point for headless operations.

```bash
# Basic Speedrun
pyquery run --source data.csv --output results.parquet

# Project Mode (Load the whole squad)
pyquery run --project daily_report.pyquery --output dist/
```

##### 🛠️ Execution Modes:

- Source Mode (--source): Quick ad-hoc processing of single files, SQL queries, or APIs.
- Project Mode (--project): Load a predefined .pyquery project file containing multiple datasets and recipes.

Note: These flags are mutually exclusive. Choose your path.

### 📟 Sequential Boot Logs

Real-time kernel-style logs with cinematic pacing. It doesn’t say "loading"... It **declares intent**.

- Timestamped steps.
- Module icons (`⚡ Engine`, `💾 IO`, `🧠 Planner`).
- Your terminal doesn’t just start PyQuery. It **witnesses it**.

### 🧩 Focused UI (Modal Upgrade)

Sidebars are for tourists. PyQuery loads data through **dedicated modal dialogs**—because loading data is a moment, not a side quest.

- **Blazing-Fast & Optimistic:** The dialog opens **instantly**.
- **Lazy Preview:** We scan 100k+ files without freezing the UI.
- **Recent Paths:** We remember so you don't have to.
- **Preview Before Commit:** See matched files and sheets before you import. You don't guess anymore; you **confirm with intent**.

---

## 💪 The Flex (Capabilities)

We built an empire so you can rule yours. This isn't just software; it's a lifestyle.

### 🎯 EDA: The Crystal Ball (Expanded)

> _"Most tools describe the past. PyQuery predicts the future."_

EDA is no longer just "looking at data". It's **hunting**.

#### 1. 🧬 Dataset DNA & Health Check

We scan your data's soul.

- **Missing Cells:** We don't just count nulls; we judge them. (<1% is excellence, >10% is sloppy).
- **Cardinality Checks:** Instantly know if a column is categorical or continuous.
- **Duplicate Detection:** We find the clones and eliminate them.

#### 2. 🚀 The Action Engine (ML Strategist)

- **Strategic Brief:** A "Top 3 Insights" card that ranks every signal in your data. It whispers: _"The money is here."_
- **Automated Drivers:** It finds the hidden variables controlling your target.
  - _"Why is Churn high? It's not Price. It's Customer Support Wait Time > 5m."_ -> **Boom. Solved.**
- **Correlation Matrix:** Pearson, Cramer’s V, and F-Tests calculated automatically. We know the relationships better than you know your own situationship.

#### 3. 🧪 ML Laboratory (The Brain)

- **Auto-Pilot Mode:** Trains an army of models (Random Forest, Lasso, Ridge) to find the best fit. You sit back and look busy.
- **Clustering (Unsupervised Rizz):** Elbow Plots & Silhouette Scores optimization. We even name the segments for you ("Cluster 1 = High Spend, Low Age").
- **Explainable Anomalies:** Uses Isolation Forests to catch the weirdos and fraudsters instantly, with a **Contextual Profiler** to tell you _why_ they are weird.

#### 4. 🎮 Decision Simulator (The Time Machine)

- **"What-If" Sliders:** Change variables in real-time. _"If I raise Price by 10% and lower ad spend, do I still profit?"_
- **Monte Carlo Sims:** Run 1,000+ simulations. We don't guess; we calculate the probability of your success.
- **Waterfall Analysis:** The Model breaks down exactly _why_ the prediction changed.

#### 5. 📈 Time Series & Visuals That Don't Miss

- **Holt-Winters Forecasting:** Predicting the future with confidence intervals.
- **Decomposition:** Splitting data into Trend, Seasonality, and Noise.
- **Cohort Comparison:** Volcano Plots visualizing "Effect Size" vs "Significance." We bring the science.

---

### 💻 The Integrated IDE (Code is Power)

For those who speak the language of the gods (Python/SQL), we built a **React-based Code Editor** right inside the UI.

- **Embedded Ace Editor:** Syntax highlighting, line numbers, and active line focus. Feels like VS Code, lives in your browser.
- **Intelligent Auto-Completions:** Context-aware suggestions for `pl`, `np`, `math`. Type `col` get `col("name")`. It knows your schema.
- **Sandboxed Custom Scripts:**
  - **AST-Validated Security:** We parse your code _before_ execution.
  - **Blocked:** `import os`, private attributes, system calls.
  - **Allowed:** `numpy`, `scipy`, `sklearn`. Pure math and logic only.

### 🧪 SQL Lab: The Codex (God Mode)

For when the GUI is too easy and you want to flex raw SQL. This isn't SQLite. This is **High-Performance Lazy SQL**.

- **Zero-Lag Querying:** Run `SELECT *` on a **50GB file**? It pulls a preview instantly. The engine effectively cheats physics.
- **Cross-Dataset Joins:** Join `sales.csv` with `targets.xlsx` using standard SQL.
- **Materialize:** Execute complex queries, then save as a new dataset.

---

### 🧹 The Forge (Ruthless ETL)

Backend I/O that actually understands real-world data. Real data is cursed. We planned for that.

- **🧬 Advanced Auto-Encoding Healer:**
  - Scans the first bytes of every CSV to automatically fix `UnicodeDecodeError`.
  - **Stream-Based Healing:** Processes multi-GB files in 4MB chunks. Memory usage stays flat.
  - **Sanitization:** Strips `Null Bytes`, normalizes newlines, and replaces garbage.
- **🧩 Mixed-Encoding Folder Handling:**
  - If a folder contains files with different encodings, PyQuery detects it and switches strategy automatically.
  - We isolate. We adapt. We continue.
- **📂 Recursive Folder Globbing (Upgraded):**
  - Patterns like `data/**/*.csv` work even when schemas differ slightly or headers are misaligned.
- **🏗️ Staging Ground (Infrastructure Rizz):**
  - Control your intermediate storage. If your `%TEMP%` partition is small, tell PyQuery where the real space is using the `PYQUERY_STAGING_DIR` environment variable.
  ```bash
  # Linux/Mac Power Move
  export PYQUERY_STAGING_DIR="/mnt/fast_ssd/pyquery_cache"
  pyquery run ...
  ```
- **🔍 Advanced File Filtering (Precision Strikes):**
  - Multiple Filter Types: `Glob`, `Regex`, `Contains`, `Not Contains`, `Exact`, `Is Not`.
  - **Stackable Logic:** Must contain `sales` + Must NOT contain `backup` + Must match regex `\d{4}`.
  - This is surgical file selection. No more loading junk and cleaning later.
- **📊 Excel Handling That Respects Your Sanity:**
  - **Multi-Sheet Selection:** Load one sheet, many sheets, or only the ones that matter.
  - **Template-Based Mapping:** Pick a base file, preview its sheets, and apply that selection across all matching files.
  - **Sheet Name Filtering:** Regex-powered selection like `Q[1-4]_Data`.
- **✨ Source Awareness & Cleanliness:**
  - **Metadata Injection:** Automatically add `__source_path__` and `__source_name__`.
  - **Auto Type Inference:** Samples data, infers dtypes, and instantly appends a **Clean & Cast** step.
- **✨ Auto-Typecast:** One click scans rows and forcibly converts `Strings` to `Int`, `Float`, or `Date`.
- **🎭 PII Incinerator:** Detects and obfuscates credit cards and SSNs. Secrets remain secret.
- **🩹 Smart Impute:** Fill the voids. Forward fill, backward fill, median, or specific value injection. No null survives.
- **💥 Explode & Coalesce:** Flatten lists and merge columns like a boss.

---

## 🧠 The Tech Stack (Forbidden Knowledge) 🐐

This isn't just a library. It's a weapon system.

### 1. 🌊 The "Infinite Stream" Glitch (Lazy Execution)

The Old Gods (Pandas) are **Eager**. They try to swallow the ocean (RAM) whole. They choke.
**PyQuery is Lazy.** It waits. It plans.

- **Scan:** "It's a 100GB file. Interesting."
- **Plan:** Filters, joins, math. Nothing executes until the final blow.
- **Stream:** Data flows in chunks. Process. Write. Destroy.
- **Result:** Processing 100GB on a MacBook Air. The laws of physics are optional.

### 2. ⚙️ File-Level Execution Control

Most engines think in **datasets**. PyQuery thinks in **files**.

- **Individual File Processing:** Forces the engine to load files one-by-one instead of bulk scanning.
- **Why it matters:** One corrupted CSV no longer nukes the entire pipeline. We fix schemas and clean data _before_ concatenation. This is how PyQuery survives enterprise-grade mess.

### 3. 🚀 Streaming I/O Architecture

We rewired the backend for scale.

- **True Streaming Discovery:** Uses generators and lazy iteration. Point at 100k files without crashing.
- **Partial Globbing:** Simple text filters convert to filesystem-level globs. Python never even _sees_ irrelevant files.

### 4. 🛡️ Type Safety (Absolute Order)

Python is dynamic (chaotic). PyQuery imposes **Order**.

- Every step is backed by a **Pydantic Model**.
- If a `String` tries to infiltrate a `Float` column, it is terminated **before** execution.
- No runtime surprises. Only calculated victories.

---

## 🧾 The Receipts (Benchmarks)

We don't post without proof. We mog the competition.

| Metric            | 🐼 Pandas (Legacy)       | ⚡ PyQuery (Polars)     | The Diff       |
| :---------------- | :----------------------- | :---------------------- | :------------- |
| **Load 10GB CSV** | `MemoryError` (Crash) 💥 | **0.2s** (Lazy Scan) ⚡ | **Infinite**   |
| **Filter Rows**   | 15.4s (Slow)             | **0.5s** (Parallel)     | **30x Faster** |
| **Group By**      | 45s (Painful)            | **2.1s** (Instant)      | **20x Faster** |
| **RAM Usage**     | 12GB+ (Bloated)          | **500MB** (Lean)        | **95% Less**   |

> _Benchmarks run on a standard dev laptop. Results may vary but the vibe remains consistent._

---

## 🎮 Choose Your Fighter (4 Paths to Power)

We don't limit you. Dominate however you choose.

### 📦 Installation

```bash
pip install pyquery-polars

```

### 1. 🌊 The GUI (God Mode)

For when you want to click things, see pretty charts, and feel like a data scientist in a sci-fi movie.

- **Visual Recipe Builder:** Nodes and edges of pure logic.
- **Native File Picker:** Access local filesystem directly.

```bash
pyquery ui
# Launches the Web App on localhost:8501 🚀

```

### 2. 🤖 The API (Headless Beast)

Building a machine? Run PyQuery as the engine.

- **Swagger Docs:** Auto-generated at `/docs`.
- **Async:** Fire and forget jobs via `POST /recipes/run`.

```bash
pyquery api
# Serving high-performance ETL over HTTP at localhost:8000 📡

```

### 3. ⚡ The Batch Runner (Speedrun)

For automation. No interface. Just speed.

```bash
pyquery run -s input.csv -r recipe.json -o output.parquet
# Task complete. ⚡

```

### 4. 🧙‍♂️ The Sorcerer (Python SDK)

For the developers who want to weave PyQuery into their own code.

```python
from pyquery_polars.backend.engine import PyQueryEngine
# Full programmatic control over the recipe engine.
# You are the architect now.

```

---

## 🧰 The Loadout (Arsenal)

Packed with every tool needed to clear the map.

| Category      | The Tools                                            | Why it slaps                            |
| ------------- | ---------------------------------------------------- | --------------------------------------- |
| **Cleaning**  | `Fill Nulls`, `Mask PII`, `Smart Extract`, `Regex`   | Turns garbage data into gold. ✨        |
| **Analytics** | `Rolling Agg`, `Time Bin`, `Rank`, `Diff`, `Z-Score` | High-frequency trading vibes. 📈        |
| **Combining** | `Smart Join`, `Concat`, `Pivot`, `Unpivot`           | Merge datasets without the headache. 🤝 |
| **Math**      | `Log`, `Exp`, `Clip`, `Date Offset`                  | For the scientific girlies. 👩‍🔬          |
| **Text**      | `Slice`, `Case`, `Replace`, `One-Hot`                | String manipulation on steroids. 💪     |
| **I/O**       | `CSV`, `Parquet`, `Excel`, `JSON`, `IPC`             | Speaks every language. 🗣️               |

---

## 🗺️ The Roadmap (Manifesting Destiny) 🔮

We aren't stopping here. We are aiming for the moon. 🚀

- **Phase 1: Native App Supremacy (Rust + Tauri):** The browser has limits. The Native App will have none. GPU-accelerated plotting (10M points at 144Hz) and OLED black themes.
- **Phase 2: Big Data Devourer:** Cloud connectors (S3, GCS, Azure). We drink their milkshakes.

---

## 🧑‍💻 Join the Cult (Developer Guide)

You want to contribute? Good. We need strong allies.

### The Blooding (Adding a Transform) 🖐️

**1. Backend Implementation:**

- Define Params: Create a Pydantic model (`src/pyquery_polars/core/params.py`).
- Backend Logic: Write a pure polars function (`src/pyquery_polars/backend/transforms/`).
- Register: Add step to `register_all_steps()` in `registry.py`.

**2. Frontend Implementation:**

- Create a Renderer Function (`src/pyquery_polars/frontend/steps/`).
- Register: Add step to `register_frontend()` in `registry_init.py`.

It appears in the CLI, API, and UI **automatically**. 🤯

```python
# Only certified ballers contribute code.
# Are you up for it?

```

---

## 📜 License

**GPL-3.0**. Open source forever. 💖

---

<div align="center">

_Made with ☕, 🦀 (Rust), and 💖 by [Sudharshan TK](https://github.com/tks18)_

</div>
