Skip to content

Single Month Feature Set

Usage

config.yaml
...
features:
  SingleMonthFeatureConfig:
    sum_columns:
      - QTY
      - grandTotal
    count_columns:
      - QTY
    mean_columns:
      - QTY
      - grandTotal
    count_if_one_columns:
      - Unique_SKUs
    calculate_percentage_change: True
    resolve_divide_by_zero: True

Features Generated

Feature Type Description Lagged Aggregation Lagged Periods (Months)
Count Count of occurrences for specified columns Yes (Lagged by months) L1, L3, L6
Sum Sum of values for specified columns Yes (Lagged by months) L1, L3, L6
Mean Mean (average) of values for specified columns Yes (Lagged by months) L1, L3, L6
Count If Equals One Count of rows where a specified column equals 1 Yes (Lagged by months) L1, L3, L6
Ratios Calculates the ratios between all above fields Partially (Uses the ratio between lags) L6_L1, L6_L3, L3_L1

Feature Configuration

SingleMonthFeatureConfig Class

SingleMonthFeatureConfig is a configuration class for generating features based on a single month of data.

Attributes

  • lag_months: List of integers representing the number of months to lag. (by default: [1, 3, 6])
  • count_columns: List of column names to generate count features for.
  • sum_columns: List of column names to generate sum features for.
  • mean_columns: List of column names to generate mean features for.
  • count_if_one_columns: List of column names to generate count_if_one features for.

Methods

  • get_function_dict(): Returns a dictionary of feature functions and their corresponding column names.

Feature Generation

SingleMonthFeatureSet Class

SingleMonthFeatureSet is a feature set that generates features based on a single month of data.

Methods

  • calculate(df, dataset_config, feature_config): Calculate the features for the given DataFrame, DatasetConfig, and FeatureConfig.
Feature Functions
  • sum_(df, key_cols, agg_col): Calculate the sum of the given aggregation column for the given key columns.
  • count_(df, key_cols, agg_col): Calculate the count of the given aggregation column for the given key columns.
  • mean_(df, key_cols, agg_col): Calculate the mean of the given aggregation column for the given key columns.
  • count_if_one(df, key_cols, agg_col): Calculate the count of rows where the given aggregation column is equal to 1 for the given key columns.

Lagged Aggregation

lagged_aggregation Decorator

The lagged_aggregation decorator is used to apply a given aggregation function to a lagged version of a DataFrame. This allows you to calculate features for each single month lag in the specified list of months and join the results to the input DataFrame.

Parameters

  • months_list (List[int]): List of integers representing the number of months to lag the DataFrame by.
  • month_col (str): Name of the column in the DataFrame that contains the month information.
  • include_col_name (bool, optional): Boolean indicating whether or not to include the name of the aggregation column in the resulting column name.

Usage Example

@lagged_aggregation(months_list=[1, 3, 6], month_col="month")
def sum_(df, key_cols, agg_col):
    # Your aggregation logic here
    pass

The @lagged_aggregation decorator is applied to aggregation functions like sum_, count_, mean_, and count_if_one within the SingleMonthFeatureSet class to generate lagged features based on the specified months.