Metadata-Version: 2.1
Name: features-creation
Version: 0.1.0
Summary: Creates new DataFrame columns by applying strategically selected operations.
Home-page: https://github.com/andresdigiovanni/features-creation
License: MIT
Author: Andrés Di Giovanni
Author-email: andresdigiovanni@gmail.com
Requires-Python: >=3.9
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: numpy (>=1.25.1,<2.0.0)
Requires-Dist: pandas (>=2.0.3,<3.0.0)
Requires-Dist: tqdm (>=4.65.0,<5.0.0)
Description-Content-Type: text/markdown

# FeaturesCreation

Efficiently creates new DataFrame columns by applying strategically selected operations, optimizing result relevance and significance. It offers a wide range of functions, intelligent operation selection, and seamless integration with popular data analysis libraries, empowering users to enhance data manipulation effortlessly.


# How it works: Transformation Process

The FeaturesCreation library offers a powerful transformation process that allows users to efficiently create new DataFrame columns with strategically selected operations.

1. Instantiation and Fitting:

First, you need to instantiate the FeaturesCreation class and specify the classifier you want to use for selecting operations. For example, fe_cr = FeaturesCreation().

Then, you fit the FeaturesCreation instance to your data by calling fe_cr.fit(x, y, classifier, n_new_features), where x represents the feature data (input), y is the target column (output), classifier is the chosen classifier (e.g., LGBMClassifier), and n_new_features is the desired number of new features to create.

2. Transformation Selection:

During the fitting process, the FeaturesCreation class intelligently selects the most relevant and significant transformations to apply to the data. It leverages the provided classifier to evaluate the importance of each potential transformation and selects the top operations that yield the best results.

3. Application of Transformations:

After fitting, the selected transformations are ready to be applied to the original DataFrame. To apply these transformations, call fe_cr.apply_transformation(df, transformations), where df is the original DataFrame, and transformations contains the chosen operations.

4. Resulting DataFrame:

The apply_transformation method returns a new DataFrame with the original data and the newly created columns resulting from the applied transformations.

## DataFrame Before Transformations

Consider the original DataFrame as follows:

|   | sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) |
|:-:|:-----------------:|:----------------:|:-----------------:|:----------------:|
| 0 |        5.1        |        3.5       |        1.4        |        0.2       |
| 1 |        4.9        |        3.0       |        1.4        |        0.2       |
| 2 |        4.7        |        3.2       |        1.3        |        0.2       |
| 3 |        4.6        |        3.1       |        1.5        |        0.2       |
| 4 |        5.0        |        3.6       |        1.4        |        0.2       |

## DataFrame After Transformations

Now, let's apply the transformations to the original DataFrame. The resulting DataFrame will be the newly created columns based on the selected operations:

|   | sepal length (cm)__mod__petal length (cm) | sepal length (cm)__truediv__petal length (cm) | sepal width (cm)__truediv__petal width (cm) |
|:-:|:-----------------------------------------:|:---------------------------------------------:|:-------------------------------------------:|
| 0 |                    0.9                    |                    3.642857                   |                     17.5                    |
| 1 |                    0.7                    |                    3.500000                   |                     15.0                    |
| 2 |                    0.8                    |                    3.615385                   |                     16.0                    |
| 3 |                    0.1                    |                    3.066667                   |                     15.5                    |
| 4 |                    0.8                    |                    3.571429                   |                     18.0                    |

The new columns are named in the format "feature1__operation__feature2" and contain the transformed values generated by applying the specified operations to the original data.


# Examples

Examples can be found in [examples/](examples/).

```python

# Instantiate the FeaturesCreation class and the classifier
fe_cr = FeaturesCreation()
classifier = LGBMClassifier(verbose=-1)

# Define the number of new features to create
n_new_features = 3

# Separate the features (X) and the target column (y)
x, y = df.drop(columns=[target_column]), df[target_column]

# Create new transformations using FeaturesCreation.fit()
transformations = fe_cr.fit(x, y, classifier, n_new_features)

# Apply the transformations to the DataFrame using FeaturesCreation.apply_transformation()
transformed_df = fe_cr.apply_transformation(df, transformations)

# Concatenate the transformed DataFrame with the original DataFrame
transformed_df = pd.concat([df, transformed_df], axis=1)

```

