Metadata-Version: 2.4
Name: informatica-python
Version: 1.9.7
Summary: Convert Informatica PowerCenter workflow XML to Python/PySpark code
Author: Nick
License: MIT
Keywords: informatica,powercenter,etl,code-generator,pandas,pyspark,data-engineering
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Database :: Database Engines/Servers
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: lxml>=4.9.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Dynamic: license-file

# informatica-python

Convert Informatica PowerCenter workflow XML exports into clean, runnable Python/PySpark code.

**Author:** Nick
**License:** MIT
**PyPI:** [informatica-python](https://pypi.org/project/informatica-python/)

---

## Overview

`informatica-python` parses Informatica PowerCenter XML export files and generates equivalent Python code using your choice of data library. It handles all 72 DTD tags from the PowerCenter XML schema and produces a complete, ready-to-run Python project.

## Installation

```bash
pip install informatica-python
```

## Quick Start

### Command Line

```bash
# Generate Python files to a directory
informatica-python workflow_export.xml -o output_dir

# Generate as a zip archive
informatica-python workflow_export.xml -z output.zip

# Use a different data library
informatica-python workflow_export.xml -o output_dir --data-lib polars

# Include a parameter file
informatica-python workflow_export.xml -o output_dir --param-file workflow.param

# Enable data quality validation on type casts
informatica-python workflow_export.xml -o output_dir --validate-casts

# Parse to JSON only (no code generation)
informatica-python workflow_export.xml --json

# Save parsed JSON to file
informatica-python workflow_export.xml --json-file parsed.json
```

### Python API

```python
from informatica_python import InformaticaConverter

converter = InformaticaConverter()

# Parse and generate files to a directory
converter.convert("workflow_export.xml", output_dir="output_dir")

# Parse and generate zip archive
converter.convert("workflow_export.xml", output_zip="output.zip")

# Parse to structured dict (no code generation)
result = converter.parse_file("workflow_export.xml")

# Use a different data library
converter = InformaticaConverter(data_lib="polars")
converter.convert("workflow_export.xml", output_dir="output_dir")
```

## Generated Output Files

| File | Description |
|------|-------------|
| `helper_functions.py` | Database/file I/O helpers, 90+ Informatica expression equivalents, window/analytic functions, stored procedure execution, state persistence |
| `mapping_{name}.py` | One per mapping, named after the real Informatica mapping name — transformation logic with vectorized expressions, row-count logging, type casting, inline documentation |
| `workflow.py` | Task orchestration with topological ordering, decision branching, worklet calls, and error handling |
| `config.yml` | Connection configs, source/target metadata, runtime parameters |
| `all_sql_queries.sql` | All SQL extracted from Source Qualifiers, Lookups, SQL transforms (with ANSI-translated variants) |
| `error_log.txt` | Conversion summary with unsupported transform analysis, unmapped port detection, and unknown expression function tracing |

## Supported Data Libraries

Select via `--data-lib` CLI flag or `data_lib` parameter:

| Library | Flag | Best For |
|---------|------|----------|
| **pandas** | `pandas` (default) | General-purpose, most compatible |
| **dask** | `dask` | Large datasets, parallel processing |
| **polars** | `polars` | High performance, Rust-backed |
| **vaex** | `vaex` | Out-of-core, billion-row datasets |
| **modin** | `modin` | Drop-in pandas replacement, multi-core |

## Supported Transformations

The code generator produces real, runnable Python for these transformation types:

- **Source Qualifier** — SQL override, pre/post SQL, column selection, session connection overrides, `$$PARAM` substitution in SQL
- **Expression** — Field-level expressions converted to vectorized pandas operations (`df["COL"]` style) with 40+ vectorized function handlers
- **Filter** — Row filtering with vectorized converted conditions
- **Joiner** — `pd.merge()` with join type and condition parsing (inner/left/right/outer)
- **Lookup** — `pd.merge()` lookups with connection-aware DB reads, multiple match policies, default values, `$$PARAM` substitution
- **Aggregator** — `groupby().agg()` with SUM/COUNT/AVG/MIN/MAX/FIRST/LAST, computed aggregates
- **Sorter** — `sort_values()` with multi-key ascending/descending per-field direction from SORTDIRECTION attribute
- **Router** — Multi-group conditional routing with named groups
- **Union** — `pd.concat()` across multiple input groups
- **Update Strategy** — DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT routing with actual target INSERT/UPDATE/DELETE operations, dialect-aware SQL placeholders, auto-detected primary keys; vectorized expression parsing with row-level fallback
- **Sequence Generator** — Auto-incrementing ID columns
- **Normalizer** — `pd.melt()` with auto-detected id/value vars
- **Rank** — `groupby().rank()` with Top-N filtering
- **Stored Procedure** — Full code generation with Oracle/MSSQL/generic support, input/output parameter mapping
- **Custom / Java** — Placeholder stubs with TODO markers
- **SQL Transform** — Direct SQL execution pass-through with `$$PARAM` substitution

## Supported XML Tags (72 Tags)

**Top-level:** POWERMART, REPOSITORY, FOLDER, FOLDERVERSION

**Source/Target:** SOURCE, SOURCEFIELD, TARGET, TARGETFIELD, TARGETINDEX, TARGETINDEXFIELD, FLATFILE, XMLINFO, XMLTEXT, GROUP, TABLEATTRIBUTE, FIELDATTRIBUTE, METADATAEXTENSION, KEYWORD, ERPSRCINFO

**Mapping/Mapplet:** MAPPING, MAPPLET, TRANSFORMATION, TRANSFORMFIELD, TRANSFORMFIELDATTR, TRANSFORMFIELDATTRDEF, INSTANCE, ASSOCIATED_SOURCE_INSTANCE, CONNECTOR, MAPDEPENDENCY, TARGETLOADORDER, MAPPINGVARIABLE, FIELDDEPENDENCY, INITPROP, ERPINFO

**Task/Session/Workflow:** TASK, TIMER, VALUEPAIR, SCHEDULER, SCHEDULEINFO, STARTOPTIONS, ENDOPTIONS, SCHEDULEOPTIONS, RECURRING, CUSTOM, DAILYFREQUENCY, REPEAT, FILTER, SESSION, CONFIGREFERENCE, SESSTRANSFORMATIONINST, SESSTRANSFORMATIONGROUP, PARTITION, HASHKEY, KEYRANGE, CONFIG, SESSIONCOMPONENT, CONNECTIONREFERENCE, TASKINSTANCE, WORKFLOWLINK, WORKFLOWVARIABLE, WORKFLOWEVENT, WORKLET, WORKFLOW, ATTRIBUTE

**Shortcut:** SHORTCUT

**SAP:** SAPFUNCTION, SAPSTRUCTURE, SAPPROGRAM, SAPOUTPUTPORT, SAPVARIABLE, SAPPROGRAMFLOWOBJECT, SAPTABLEPARAM

## Key Features

### Generated Code Quality (v1.9.3+)

Generated code follows clean formatting and commenting standards:
- Consistent section headers (`# ---`) for Source Qualifiers, Transformations, and Target Writes
- Each section includes metadata: database type, field lists, descriptions
- Column mapping comments (`# Column mapping: source -> target`) and write operation type comments (`# Write to database table` / `# Write to file`)
- Expression inline comments showing original Informatica expression (e.g., `# FULL_NAME = UPPER(FIRST_NAME) || ' ' || UPPER(LAST_NAME)`)
- Clean indentation: no blank line after `try:`, no consecutive blank lines inside function body
- Mapping-level `try:/except` wrapper with `logger.error()` for runtime visibility

### Smart Target Write Detection (v1.9.3+)

Targets are automatically classified as database or file writes:
- Targets with `database_type` set (Oracle, SQL Server, etc.) generate `write_to_db()` calls
- Targets with flatfile metadata or file extensions (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) generate `write_file()` calls
- Bare targets (no metadata) default to `write_to_db()` since Informatica targets are typically database tables
- Schema-qualified names (e.g., `dbo.MY_TABLE`) correctly route to database writes
- Session file path overrides take priority when present

### Vectorized Expression Engine (v1.9.2+)

Column-level pandas operations instead of row-level iteration. The expression converter uses a recursive parenthesis-aware parser that handles:

**Conditional / Null:**
- `IIF(cond, val, else_val)` → `np.where()` — supports 2-arg form (missing else defaults to `None`)
- `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains
- `DECODE(field, val1, res1, ..., default)` → value-matching `np.where()`
- `NVL(val, default)` → `.fillna()`
- `IS_SPACES(field)` → `field.str.strip().eq("")`
- `IS_NUMBER(field)` → `pd.to_numeric(field, errors="coerce").notna()`
- `IN(field, val1, val2, ...)` → `field.isin([...])`

**String:**
- `UPPER/LOWER` → `.str.upper()/.str.lower()`
- `LTRIM/RTRIM/TRIM` → `.str.lstrip()/.str.rstrip()/.str.strip()` with custom char support
- `SUBSTR(val, start, len)` → `.str[start:end]`
- `INSTR(val, search)` → `.str.find()`
- `LPAD/RPAD` → `.str.pad()`
- `REVERSE(val)` → `.str[::-1]`
- `INITCAP(val)` → `.str.title()`
- `REPLACECHR/REPLACESTR` → `.str.replace()`
- `REG_EXTRACT/REG_REPLACE` → `.str.extract()/.str.replace(regex=True)`
- `CHR(code)` → `chr(int(code))`
- `||` concatenation → `+` with `.astype(str)` on non-literals

**Date/Time:**
- `TO_DATE(val, fmt)` → `pd.to_datetime()` with Informatica→Python format conversion
- `TO_CHAR(val, fmt)` → `.dt.strftime()`
- `ADD_TO_DATE(date, part, amount)` → `date + pd.to_timedelta()` with full unit mapping (YY/MM/DD/HH/MI/SS)
- `DATE_DIFF(date1, date2, part)` → `(date1 - date2).dt.days` / `.dt.total_seconds() / 3600` etc.
- `SYSDATE/SYSTIMESTAMP` → `pd.Timestamp.now()`
- `TRUNC(date, 'DD')` → date truncation via `.dt.floor()/.dt.to_period()`
- `MAKE_DATE_TIME(y, m, d, h, mi, s)` → `pd.Timestamp()`

**Numeric:**
- `TO_INTEGER/TO_BIGINT/TO_FLOAT/TO_DECIMAL` → `pd.to_numeric()`
- `TRUNC(val)` → `np.trunc()` for numeric truncation
- `ROUND/ABS/CEIL/FLOOR/POWER/SQRT/MOD/LOG/SIGN` → `np.*` equivalents

**Special:**
- `:LKP.TABLE(args)` — Connected lookup references → `df_lkp_table` merge
- `:PORT.FUNC(args)` — Unconnected lookups → `lookup_func("FUNC", args)` calls
- Inline `--` comment stripping (respects string literals)
- String-literal-aware field substitution

### Expression Converter (90+ Row-Level Functions)

All Informatica expression functions are available as row-level Python equivalents in `helper_functions.py`:

- **String:** `substr`, `ltrim`, `rtrim`, `upper`, `lower`, `lpad`, `rpad`, `instr`, `length`, `concat`, `replacechr`, `replacestr`, `reg_extract`, `reg_replace`, `reg_match`, `reverse_str`, `initcap`, `chr_func`, `ascii_func`, `left_str`, `right_str`, `trim_func`, `indexof`, `metaphone_func`, `soundex_func`, `compress_func`, `decompress_func`
- **Date:** `add_to_date`, `date_diff`, `date_compare`, `get_date_part`, `set_date_part`, `last_day`, `make_date_time`, `to_date`, `to_char`, `to_timestamp_func`, `current_timestamp`, `session_start_time`
- **Numeric:** `round_val`, `trunc`, `mod_val`, `abs_val`, `ceil_val`, `floor_val`, `power_val`, `sqrt_val`, `log_val`, `ln_val`, `exp_val`, `sign_val`, `rand_val`, `greatest_val`, `least_val`
- **Conversion:** `to_integer`, `to_bigint`, `to_float`, `to_decimal`, `cast_func`
- **Null/Conditional:** `iif_expr`, `decode_expr`, `nvl`, `nvl2`, `isnull`, `is_spaces`, `is_number`, `is_date`, `in_expr`, `choose_expr`
- **Aggregate:** `sum_val`, `avg_val`, `count_val`, `min_val`, `max_val`, `first_val`, `last_val`, `median_val`, `stddev_val`, `variance_val`, `percentile_val`
- **Window/Analytic:** `moving_avg`, `moving_avg_df`, `moving_sum`, `moving_sum_df`, `cume`, `cume_df`, `percentile_df`
- **Lookup:** `lookup_func` — Placeholder for runtime lookup resolution
- **Variable:** `get_variable`, `set_variable`, `set_count_variable`
- **Control:** `raise_error`, `abort_func`

### Row-Count Logging (v1.8+)

Generated code automatically logs row counts at every step of the data pipeline:

```
Source SQ_CUSTOMERS: 10000 rows read
EXP_CALC (Expression): 10000 input rows -> 10000 output rows
FIL_ACTIVE (Filter): 10000 input rows -> 8542 output rows
AGG_TOTALS (Aggregator): 8542 input rows -> 150 output rows
Target TGT_SUMMARY: 150 rows written
```

### Generated Code Documentation (v1.8+)

Every generated mapping function includes a rich docstring describing:
- Mapping name and original Informatica description
- Source and target tables/files
- Transformation pipeline with field counts per step

Each transformation block is annotated with:
- Separator headers for visual scanning
- Transform type and description (from Informatica XML)
- Input and output field lists (truncated at 10 for readability)

### Update Strategy with Target Operations (v1.7+)

Update Strategy transforms now generate real INSERT/UPDATE/DELETE operations:
- Static strategies (0/1/2/3) map to INSERT/UPDATE/DELETE/REJECT
- DD_INSERT/DD_UPDATE/DD_DELETE/DD_REJECT expressions parsed from conditions
- Target writer splits rows and routes to appropriate SQL operations
- Dialect-aware SQL placeholders (`?` for MSSQL, `%s` for PostgreSQL/Oracle)
- Primary key columns auto-detected from target field definitions

### Window / Analytic Functions (v1.7+)

DataFrame-level analytic functions for aggregation transforms:
- `moving_avg_df(df, col, window)` — rolling mean via `.rolling().mean()`
- `moving_sum_df(df, col, window)` — rolling sum via `.rolling().sum()`
- `cume_df(df, col)` — cumulative sum via `.expanding().sum()`
- `percentile_df(df, col, pct)` — quantile via `.quantile()`

### Stored Procedure Execution (v1.7+)

Full stored procedure code generation (not just stubs):
- Oracle: `cursor.callproc()` with output parameter registration
- MSSQL: `EXEC` with output parameter capture
- Generic: `CALL` syntax for other databases
- Input/output parameter mapping from transformation fields
- Empty-input guard prevents errors on empty upstream DataFrames

### State Persistence (v1.7+)

JSON-based variable persistence between workflow runs:
- `load_persistent_state()` / `save_persistent_state()` bracketing workflow execution
- `get_persistent_variable()` / `set_persistent_variable()` scoped by workflow/mapping name
- Mapping variables marked `is_persistent="YES"` automatically load from and save to state file
- Non-persistent variables remain unaffected

### SQL Dialect Translation (v1.6+)

Automatically translates vendor-specific SQL to ANSI equivalents:
- **Oracle:** NVL→COALESCE, SYSDATE→CURRENT_TIMESTAMP, DECODE→CASE, NVL2→CASE, (+)→ANSI JOIN, ROWNUM→LIMIT
- **MSSQL:** GETDATE→CURRENT_TIMESTAMP, ISNULL→COALESCE, TOP N→LIMIT, LEN→LENGTH, CHARINDEX→POSITION
- Auto-detects source dialect; outputs both original and translated SQL

### Enhanced Error Reporting (v1.6+)

Structured error log with three analysis sections:
- **Unsupported Transforms:** Lists each skipped transform with type, field count, and attributes
- **Unmapped Ports:** OUTPUT fields not connected to any downstream transform
- **Unsupported Expression Functions:** Unknown functions with location traces

### Nested Mapplet Support (v1.6+)

Recursively expands mapplet-within-mapplet instances:
- Double-underscore namespacing for nested transforms
- Depth limit of 10 with circular reference protection
- Connector rewiring through the full expansion tree

### Data Quality Validation (v1.6+)

Optional `--validate-casts` flag generates null-count checks before/after type casting:
- Counts null values pre- and post-coercion per column
- Logs warnings when coercion introduces new nulls
- Helps identify data quality issues during test runs

### Parameter File Support (v1.5+)

Standard Informatica `.param` file parsing:
- `[Global]` and `[folder.WF:workflow.ST:session]` section support
- `get_param(config, var_name)` resolution chain: config → env vars → defaults
- CLI `--param-file` flag for specifying parameter files
- `$$PARAM` variables in SQL automatically substituted with `.replace()` calls

### Session Connection Overrides (v1.4+)

When sessions define per-transform connection overrides (different database, file directory, or filename), the generated code uses those overrides instead of source/target defaults.

### Worklet Support (v1.4+)

Worklet workflows are detected and generate separate `run_worklet_NAME(config)` functions. The main workflow calls these automatically for Worklet task types.

### Type Casting at Target Writes (v1.4+)

Target field datatypes are mapped to pandas types and generate proper casting code:
- Integers: nullable `Int64`/`Int32` or `fillna(0).astype(int)` for NOT NULL
- Dates: `pd.to_datetime(errors='coerce')`
- Decimals/Floats: `pd.to_numeric(errors='coerce')`
- Booleans: `.astype('boolean')`

### Flat File Handling (v1.3+)

Parses FLATFILE metadata for delimiter, fixed-width, header lines, skip rows, quote/escape chars. Generates `pd.read_fwf()` for fixed-width or enriched `read_file()` for delimited.

### Mapplet Inlining (v1.3+)

Expands Mapplet instances into prefixed transforms, rewires connectors, and eliminates duplication.

### Decision Tasks (v1.3+)

Converts Informatica decision conditions to Python if/else branches with proper variable substitution.

## Helper Functions Library

The generated `helper_functions.py` provides a complete runtime library:

### Configuration & Parameters
| Function | Description |
|----------|-------------|
| `load_config(path, param_file)` | Load YAML config with optional `.param` file merge |
| `parse_param_file(path)` | Parse Informatica `.param` files (`[Global]`, `[folder.WF:...]` sections) |
| `get_param(config, var_name, default)` | Resolve parameter: config → env vars → default |
| `get_variable(var_name, config)` | Get workflow/mapping variable from params, env vars, or param store |
| `set_variable(var_name, value)` | Set workflow/mapping variable in param store and env |

### Database Operations
| Function | Description |
|----------|-------------|
| `get_db_connection(config, conn_name)` | Create DB connection (pyodbc/pymssql/sqlalchemy fallback for MSSQL) |
| `read_from_db(config, query, conn_name)` | Execute SQL query and return DataFrame |
| `write_to_db(config, df, table, conn_name)` | Write DataFrame to database table via `.to_sql()` |
| `execute_sql(config, sql, conn_name)` | Execute DDL/DML statement (INSERT, UPDATE, DELETE) |
| `write_with_update_strategy(config, df, table, ...)` | Split rows by `_update_strategy` column into INSERT/UPDATE/DELETE/REJECT operations |
| `call_stored_procedure(config, proc, params, ...)` | Execute stored procedure with input/output parameter mapping (Oracle/MSSQL/generic) |

### File Operations
| Function | Description |
|----------|-------------|
| `read_file(path, file_config)` | Read CSV/DAT/TXT/XML/XLSX/JSON/Parquet with auto-detection |
| `write_file(df, path, file_config)` | Write DataFrame to file with format auto-detection |

### State Persistence
| Function | Description |
|----------|-------------|
| `load_persistent_state(file)` | Load JSON state file for persistent variables |
| `save_persistent_state(file)` | Save persistent variables to JSON state file |
| `get_persistent_variable(scope, var, default)` | Get scoped persistent variable |
| `set_persistent_variable(scope, var, value)` | Set scoped persistent variable |

### Logging & Monitoring
| Function | Description |
|----------|-------------|
| `log_mapping_start(name)` | Log mapping start with timestamp |
| `log_mapping_end(name, start_time, row_count)` | Log mapping completion with elapsed time |
| `validate_row_count(df, name, min_rows)` | Validate minimum row count threshold |

## Requirements

- Python >= 3.8
- lxml >= 4.9.0
- PyYAML >= 6.0

## Changelog

### v1.9.3 (Current)
- **Smart target write detection**: Bare targets default to `write_to_db()` instead of `write_file()`; file extension allowlist (`.csv`, `.dat`, `.txt`, `.xml`, `.json`, `.parquet`, `.xlsx`, `.xls`, `.tsv`, `.avro`) for file targets; schema-qualified names (`dbo.TABLE`) correctly route to database
- **DECODE vectorization**: `DECODE(TRUE, cond1, val1, ..., default)` → nested `np.where()` chains; value-matching DECODE; handles IN() conditions and complex boolean nesting
- **IS_SPACES vectorization**: `IS_SPACES(field)` → `field.str.strip().eq("")`
- **2-arg IIF**: `IIF(cond, val)` without else clause defaults to `None`
- **REVERSE vectorization**: `REVERSE(field)` → `field.str[::-1]`
- **IN() vectorization**: `IN(field, val1, val2, ...)` → `field.isin([...])`
- **IS_NUMBER vectorization**: `IS_NUMBER(field)` → `pd.to_numeric(field, errors="coerce").notna()`
- **SYSDATE/SYSTIMESTAMP**: Bare `SYSDATE`/`SYSTIMESTAMP` → `pd.Timestamp.now()` in vectorized mode
- **TRUNC vectorization**: Numeric `TRUNC(field)` → `np.trunc()`; date `TRUNC(field, 'DD')` → `.dt.floor()`
- **ADD_TO_DATE vectorization**: `ADD_TO_DATE(date, part, amount)` → `pd.to_timedelta()` with YY/MM/DD/HH/MI/SS units
- **DATE_DIFF vectorization**: `DATE_DIFF(date1, date2, part)` → arithmetic on timedelta components
- **Unconnected lookup support**: `:PORT.FUNC_NAME(args)` → `lookup_func("FUNC_NAME", args)`
- **Inline comment stripping**: `--` comments removed from expressions (respects string literals)
- **`$$PARAM` SQL substitution**: Source Qualifier, Lookup, and SQL Transform SQL strings auto-substitute `$$VAR` with `get_param(config, 'VAR')` calls
- **Sorter direction**: Reads `SORTDIRECTION` from field attributes, generates per-field `ascending=[True, False, ...]`
- **Pass-through optimization**: Identity expressions skip `.copy()` and use direct reference
- **Duplicate lookup deduplication**: `_gen_lookup_transform` uses `seen_output_cols` set to avoid duplicate column checks
- **Mapping-level error handling**: Generated function body wrapped in `try:/except` with `logger.error()`
- **Update strategy vectorized**: Tries vectorized expression first, falls back to row-level `apply()`
- **Generated code formatting**: Consistent `# ---` section headers for Source Qualifiers, Transforms, and Target Writes; metadata comments (database type, field lists); column mapping and write operation comments; clean blank line handling
- **Source/target detection**: Case-insensitive instance type matching
- **Session→mapping inference**: Longest-suffix-match strategy for ambiguous mapping names
- **663 tests** across unit, integration, expression, and formatting test suites

### v1.9.2 (Phase 8)
- Mapping output files now use real mapping names (e.g., `mapping_m_customer_load.py`) instead of generic numeric indices (`mapping_1.py`)
- Workflow imports automatically match the named mapping files
- **Expression converter rewrite**: Recursive parenthesis-aware parser replacing simple regex; fixes nested IIF/INSTR/LTRIM/RTRIM/REPLACECHR/REPLACESTR/SUBSTR/TO_CHAR/CHR/MAKE_DATE_TIME
- **`:LKP.` references** now properly converted to `lookup_func()` calls in vectorized mode
- **String literal safety**: `||` concatenation no longer applies `.astype(str)` to string literals
- **NULL/TRUE/FALSE**: Correctly resolved as `None`/`True`/`False` before field-name substitution
- **`import pandas as pd`** and `from datetime import datetime` now included in generated mapping files
- **MSSQL connection fallbacks**: `pymssql` and `sqlalchemy` tried when `pyodbc` unavailable

### v1.8.x (Phase 7)
- Row-count logging at every pipeline step (source reads, transforms, target writes)
- Backend-safe logging (try/except wrapped for Dask/lazy backends)
- Rich mapping function docstrings with sources, targets, and transform pipeline summary
- Per-transform documentation headers with description, input/output field lists

### v1.7.x (Phase 6)
- Window/analytic functions (rolling avg/sum, cumulative sum, percentile)
- Update Strategy routing with actual INSERT/UPDATE/DELETE target operations
- Dialect-aware SQL placeholders for MSSQL/PostgreSQL/Oracle
- Full stored procedure code generation (Oracle/MSSQL/generic)
- JSON-based state persistence for mapping and workflow variables
- Primary key auto-detection for update strategy targets

### v1.6.x (Phase 5)
- SQL dialect translation (Oracle/MSSQL → ANSI)
- Enhanced error reporting (unsupported transforms, unmapped ports, unknown functions)
- Nested mapplet expansion with circular reference protection
- Data quality validation warnings on type casting (`--validate-casts`)

### v1.5.x (Phase 4)
- Parameter file support (`.param` files with section parsing)
- Vectorized expression generation (column-level pandas operations)
- Library-specific code adapters (polars/dask/modin/vaex syntax generation)
- 72+ integration tests

### v1.4.x (Phase 3)
- Session connection overrides for sources and targets
- Worklet function generation with safe invocation
- Type casting at target writes based on TARGETFIELD datatypes
- Flat-file session path overrides properly wired

### v1.3.x (Phase 2)
- FLATFILE metadata in source reads and target writes
- Normalizer with `pd.melt()`
- Rank with group-by and Top-N filtering
- Decision tasks with real if/else branches
- Mapplet instance inlining

### v1.2.x (Phase 1)
- Core parser for all 72 XML tags
- Expression converter with 80+ functions
- Aggregator, Joiner, Lookup code generation
- Workflow orchestration with topological task ordering
- Multi-library support (pandas, dask, polars, vaex, modin)

## Development

```bash
# Clone and install in development mode
cd informatica_python
pip install -e ".[dev]"

# Run tests (663 tests)
pytest tests/ -v
```

## License

MIT License - Copyright (c) 2025 Nick

See [LICENSE](LICENSE) for details.
