Metadata-Version: 2.4
Name: informatica-python
Version: 1.0.0
Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
License: MIT
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: lxml>=4.9.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"

# informatica-python

Convert Informatica PowerCenter workflow XML files to Python/PySpark code.

## Installation

```bash
pip install informatica-python
```

## Quick Start

### Command Line

```bash
# Convert XML to Python files in a directory
informatica-python workflow.xml -o output_dir

# Convert XML to a zip file
informatica-python workflow.xml -z output.zip

# Use a different data library (pandas, dask, polars, vaex, modin)
informatica-python workflow.xml -o output_dir --data-lib polars

# Parse XML to JSON (no code generation)
informatica-python workflow.xml --json

# Save parsed JSON to file
informatica-python workflow.xml --json-file parsed.json
```

### Python API

```python
from informatica_python import InformaticaConverter

# Convert XML to Python files
converter = InformaticaConverter(data_lib="pandas")
converter.convert("workflow.xml", output_dir="output")

# Convert to zip
converter.convert("workflow.xml", output_zip="output.zip")

# Parse XML to JSON dict
result = converter.parse_file("workflow.xml")

# Parse XML string
result = converter.parse_string(xml_string)
```

## Generated Output Files

| File | Description |
|------|-------------|
| `helper_functions.py` | Database/file I/O functions plus Python equivalents for 50+ Informatica expression functions |
| `mapping_N.py` | One file per mapping with full transformation logic |
| `workflow.py` | Task orchestration with topological ordering |
| `config.yml` | Connection configs, source/target metadata, variables |
| `all_sql_queries.sql` | All extracted SQL queries (source qualifiers, lookups, pre/post SQL) |
| `error_log.txt` | Conversion summary, warnings, and coverage statistics |

## Supported Transformation Types

- Source Qualifier / Application Source Qualifier
- Expression
- Filter
- Aggregator
- Sorter
- Joiner
- Lookup Procedure
- Router
- Union
- Update Strategy
- Sequence Generator
- Normalizer
- Rank
- Stored Procedure (placeholder)
- Custom Transformation (placeholder)
- Java Transformation (placeholder)
- SQL Transformation

## Supported Data Libraries

Choose your preferred data manipulation library with `--data-lib`:

- **pandas** (default) — Standard Python data analysis
- **dask** — Parallel computing with pandas-like API
- **polars** — Fast DataFrame library written in Rust
- **vaex** — Out-of-core DataFrames for large datasets
- **modin** — Drop-in pandas replacement with parallel execution

## Informatica Expression Functions

The generated `helper_functions.py` includes Python equivalents for:

`IIF`, `DECODE`, `NVL`, `NVL2`, `ISNULL`, `LTRIM`, `RTRIM`, `UPPER`, `LOWER`, `SUBSTR`, `LPAD`, `RPAD`, `TO_CHAR`, `TO_DATE`, `TO_INTEGER`, `TO_BIGINT`, `TO_FLOAT`, `TO_DECIMAL`, `REPLACECHR`, `REPLACESTR`, `INSTR`, `LENGTH`, `CONCAT`, `REG_EXTRACT`, `REG_MATCH`, `REG_REPLACE`, `GET_DATE_PART`, `ADD_TO_DATE`, `IS_DATE`, `IS_NUMBER`, `IS_SPACES`, `SYSDATE`, `ERROR`, `ABORT`, and more.

## Requirements

- Python >= 3.8
- lxml >= 4.9.0
- PyYAML >= 6.0

## License

MIT
