Tjoin¶
flowtask.components.tJoin
¶
tJoin
¶
Bases: FlowComponent
tJoin
Overview
The tJoin class is a component for joining two Pandas DataFrames based on specified join conditions. It supports various join types
(such as left, right, inner, and outer joins) and handles different scenarios like missing data, custom join conditions, and multi-source joins.
.. table:: Properties :widths: auto
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| Name | Required | Description |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| df1 | Yes | The left DataFrame to join. |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| df2 | Yes | The right DataFrame to join. |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| type | No | "left" | The type of join to perform. Supported values are "left", "right", "inner", |
| | | | "outer", and "anti-join". When "anti-join" is used, it returns the difference |
| | | | of B - A, i.e., all rows present in df1 but not in df2. |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| depends | Yes | A list of dependencies defining the sources for the join. |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| operator | No | The logical operator to use for join conditions, defaults to "and". |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| fk | No | The foreign key or list of keys to use for joining DataFrames. |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| no_copy | No | A flag indicating if copies of the DataFrames should not be made, defaults to True. |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
| join_with | No | A list of additional keys to use for join conditions. |
+------------------+----------+-----------+--------------------------------------------------------------------------------------+
Return
The methods in this class manage the joining of two Pandas DataFrames, including initialization, execution, and result handling.
It ensures proper handling of temporary columns and provides metrics on the joined rows.
Example:
```yaml
tJoin:
depends:
- TransformRows_2
- QueryToPandas_3
type: left
fk:
- store_number
args:
validate: many_to_many
```