Functions¶
flowtask.components.TransformRows.functions
¶
Functions.
Tree of TransformRows functions.
add_timestamp_to_time
¶
Takes a pandas DataFrame and combines the values from a date column and a time column to create a new timestamp column.
:param df: pandas DataFrame to be modified. :param field: Name of the new column to store the combined timestamp. :param date: Name of the column in the df DataFrame containing date values. :param time: Name of the column in the df DataFrame containing time values. :return: Modified pandas DataFrame with the combined timestamp stored in a new column.
any_tuple_valid
¶
Adds a boolean column (named field) to df that is True when
any tuple in columns has all of its columns neither NaN nor empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The DataFrame. |
required |
field
|
str
|
The name of the output column. |
required |
columns
|
list of tuple of str
|
List of tuples, where each tuple contains column names that must be checked. Example: [("start_lat", "start_long"), ("end_lat", "end_log")] |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: The original DataFrame with the new |
apply_function
¶
Apply any scalar function to a column in the DataFrame.
Parameters:
- df: pandas DataFrame
- field: The column where the result will be stored.
- fname: The name of the function to apply.
- column: The column to which the function is applied (if None, apply to field column).
- **kwargs: Additional arguments to pass to the function.
bytesio_to_base64
¶
Converts bytes in a DataFrame column to a Base64 encoded string.
:param df: The DataFrame containing the bytes column. :param field: The name of the field to store the Base64 encoded string. :param column: The name of the bytes column. :param as_string: If True, converts the Base64 bytes to a string. :return: The DataFrame with the Base64 encoded string.
calculate_distance
¶
Add a distance column to a dataframe.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
pandas DataFrame with columns 'latitude', 'longitude', 'store_lat', 'store_lng' |
required |
columns
|
List[tuple]
|
list of tuples with column names for coordinates - First tuple: [latitude1, longitude1] - Second tuple: [latitude2, longitude2] |
required |
unit
|
str
|
unit of distance ('km' for kilometers, 'm' for meters, 'mi' for miles) |
'km'
|
chunk_size
|
int
|
number of rows to process at once for large datasets |
1000
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
df with additional 'distance_km' column |
convert_timezone
¶
convert_timezone(df, field, *, column=None, from_tz='UTC', to_tz=None, tz_column=None, default_timezone='UTC')
Convert field to a target time‑zone.
Parameters¶
df : DataFrame
field : name of an existing datetime column
column : name of the output column (defaults to field)
from_tz : timezone used to localise naive timestamps
to_tz : target timezone (ignored if tz_column is given)
tz_column : optional column that contains a timezone per row
default_tz: fallback when a row's tz_column is null/NaN
Returns:
| Type | Description |
|---|---|
DataFrame
|
df with converted datetime column |
create_attachment_column
¶
Create a column with a list of attachments from one or more path/URL columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input DataFrame. |
required |
field
|
str
|
Name of the new column to store the list of attachments. |
required |
columns
|
List[str]
|
Column names to convert. You can pass either the exact column (e.g., "pdf_path_m0") or the base name (e.g., "pdf_path"). |
required |
colnames
|
Optional[Dict[str, str]]
|
Optional list of names for the attachments. If not provided, the column names will be used as names. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
The same DataFrame with |
day_of_week
¶
Extracts the day of the week from a date column.
:param df: The DataFrame containing the date column. :param field: The name of the field to store the day of the week. :param column: The name of the date column. :return: The DataFrame with the day of the week.
drop_timezone
¶
Drop the timezone information from a datetime column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
pandas DataFrame with a datetime column |
required |
field
|
str
|
name of the datetime column |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
df with timezone-free datetime column |
duration
¶
Converts a duration column to a specified unit.
:param df: The DataFrame containing the duration column. :param field: The name of the field to store the converted duration. :param column: The name of the duration column. :param unit: The unit to convert the duration to. :return: The DataFrame with the converted duration.
extract_from_dictionary
¶
Extracts a value from a JSON column in the DataFrame.
:param df: The DataFrame containing the JSON column. :param field: The name of the field to store the extracted value. :param column: The name of the JSON column. :param key: The key to extract from the JSON object. :param conditions: Optional dictionary of conditions to filter rows before extraction. :param as_timestamp: If True, converts the extracted value to a timestamp. :return: The DataFrame with the extracted value.
extract_from_object
¶
Extracts a value from an object column in the DataFrame.
:param df: The DataFrame containing the object column. :param field: The name of the field to store the extracted value. :param column: The name of the object column. :param key: The key to extract from the object. :param as_string: If True, converts the extracted value to a string. :param as_timestamp: If True, converts the extracted value to a timestamp. :return: The DataFrame with the extracted value.
fully_geoloc
¶
Adds a boolean column (named field) to df that is True when,
for each tuple in columns, all the involved columns are neither NaN nor empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The DataFrame. |
required |
field
|
str
|
The name of the output column. |
required |
columns
|
list of tuple of str
|
List of tuples, where each tuple contains column names that must be valid (non-null and non-empty). Example: [("start_lat", "start_long"), ("end_lat", "end_log")] |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: The original DataFrame with the new |
get_moment
¶
df: pandas DataFrame column: name of the column to compare (e.g. "updated_hour") ranges: list of tuples [(label, (start, end)), ...] e.g. [("night",(0,7)), ("morning",(7,10)), ...] returns: a Series of labels corresponding to each row
get_product
¶
Retrieves product information from the Barcode Lookup API based on a barcode.
:param row: The DataFrame row containing the barcode. :param field: The name of the field containing the barcode. :param columns: The list of columns to extract from the API response. :return: The DataFrame row with the product information.
haversine_distance
¶
Distance between two points on Earth in kilometers.
path_to_url
¶
Converts a file path in a DataFrame column to a URL. Replaces the base path with the base URL.
:param df: The DataFrame containing the file path column.
:param field: The name of the field to store the URL.
:param column: The name of the file path column (defaults to field).
:param base_path: The base path to replace in the file path.
:param base_url: The base URL to use for the conversion.
:return: The DataFrame with the URL in the specified field.
string_to_vector
¶
Converts a string representation of a list into an actual list.
:param df: The DataFrame containing the string representation. :param field: The name of the field to convert. :return: The DataFrame with the converted field.
upc_to_product
¶
upc_to_product(df, field, columns=['barcode_formats', 'mpn', 'asin', 'title', 'category', 'model', 'brand'])
Converts UPC codes in a DataFrame to product information using the Barcode Lookup API.
:param df: The DataFrame containing the UPC codes. :param field: The name of the field containing the UPC codes. :param columns: The list of columns to extract from the API response. :return: The DataFrame with the product information.