Single Month Feature Set
Usage
config.yaml
...
features:
SingleMonthFeatureConfig:
sum_columns:
- QTY
- grandTotal
count_columns:
- QTY
mean_columns:
- QTY
- grandTotal
count_if_one_columns:
- Unique_SKUs
calculate_percentage_change: True
resolve_divide_by_zero: True
Features Generated
| Feature Type | Description | Lagged Aggregation | Lagged Periods (Months) |
|---|---|---|---|
| Count | Count of occurrences for specified columns | Yes (Lagged by months) | L1, L3, L6 |
| Sum | Sum of values for specified columns | Yes (Lagged by months) | L1, L3, L6 |
| Mean | Mean (average) of values for specified columns | Yes (Lagged by months) | L1, L3, L6 |
| Count If Equals One | Count of rows where a specified column equals 1 | Yes (Lagged by months) | L1, L3, L6 |
| Ratios | Calculates the ratios between all above fields | Partially (Uses the ratio between lags) | L6_L1, L6_L3, L3_L1 |
Feature Configuration
SingleMonthFeatureConfig Class
SingleMonthFeatureConfig is a configuration class for generating features based on a single month of data.
Attributes
lag_months: List of integers representing the number of months to lag. (by default:[1, 3, 6])count_columns: List of column names to generate count features for.sum_columns: List of column names to generate sum features for.mean_columns: List of column names to generate mean features for.count_if_one_columns: List of column names to generate count_if_one features for.
Methods
get_function_dict(): Returns a dictionary of feature functions and their corresponding column names.
Feature Generation
SingleMonthFeatureSet Class
SingleMonthFeatureSet is a feature set that generates features based on a single month of data.
Methods
calculate(df, dataset_config, feature_config): Calculate the features for the given DataFrame,DatasetConfig, andFeatureConfig.
Feature Functions
sum_(df, key_cols, agg_col): Calculate the sum of the given aggregation column for the given key columns.count_(df, key_cols, agg_col): Calculate the count of the given aggregation column for the given key columns.mean_(df, key_cols, agg_col): Calculate the mean of the given aggregation column for the given key columns.count_if_one(df, key_cols, agg_col): Calculate the count of rows where the given aggregation column is equal to 1 for the given key columns.
Lagged Aggregation
lagged_aggregation Decorator
The lagged_aggregation decorator is used to apply a given aggregation function to a lagged version of a DataFrame. This allows you to calculate features for each single month lag in the specified list of months and join the results to the input DataFrame.
Parameters
months_list(List[int]): List of integers representing the number of months to lag the DataFrame by.month_col(str): Name of the column in the DataFrame that contains the month information.include_col_name(bool, optional): Boolean indicating whether or not to include the name of the aggregation column in the resulting column name.
Usage Example
@lagged_aggregation(months_list=[1, 3, 6], month_col="month")
def sum_(df, key_cols, agg_col):
# Your aggregation logic here
pass
The @lagged_aggregation decorator is applied to aggregation functions like sum_, count_, mean_, and count_if_one within the SingleMonthFeatureSet class to generate lagged features based on the specified months.