Contributing additional Feature Sets
As a developer contributing Feature Sets towards the framework there are a couple key steps outlined below:
- Create your Feature Config
- Create your FeatureSet
- Write tests for your FeatureSet
- Add your FeatureSet to the FEATURE_CONFIG_MAP
- Document your FeatureSet
Creating a Feature Config
FeatureSets and FeatureConfigs belong together in the same file. To create a FeatureConfig:
- Create a file for your features in the
amee_utils/feature_generator/feature_setdirectory -
Use
attrs.defineand inherit from the baseFeatureConfiginamee_utils.feature_generator.configas an example:import attrs from amee_utils.feature_generator.config import FeatureConfig @attrs.define class MyFeatureConfig(FeatureConfig): # input your feature configuration ...Note
The name of your FeatureConfig will correspond directly to the feature config in your configuration yaml file.
For example, a user's yaml file will looks as follows:
Creating a FeatureSet
In the same file as your FeatureConfig, create a FeatureSet:
- Inherit from the base
FeatureSetinamee_utils.feature_generator.feature_set -
The remainder of the implementation of your
FeatureSetis up to you in terms of how you want to do it.Note
You need to ensure that you abide by the
FeatureSetinterface. This means that you need to implement thecalculatemethod.All of the information that you need in terms of columns and configuration items should be available to you in the
FeatureConfigthat you created. This means that you need to ensure that you have these items in yourFeatureConfigbefore you can use them in yourFeatureSet.
Writing Tests
To create your tests, you only need to implement the test for the FeatureSet you've created and the FeatureConfig will be tested automatically if you use it in your FeatureSet implementation.
Example
An example of the implementation of a test can be seen in the tests/feature_generator/test_single_month.py file. But a snippet may look like:
)
df = spark.read.csv(
"tests/feature_generator/fixtures/test_single_month.csv",
header=True,
).withColumn("month", F.to_date(F.col("month"), "yyyy-MM-dd"))
result = SingleMonthFeatureSet().calculate(
df=df,
feature_config=fc,
dataset_config=dc,
calculation_date=datetime(2022, 10, 1),
)
assert_df_equality(
result, expected_df, ignore_row_order=True, ignore_column_order=True
)
@pytest.mark.parametrize(
"count_includes_missing, expected_values",
[
(False, [(1, 2, None, None, 3, None, None)]),
(True, [(1, 3, None, None, 3, None, None)]),
Adding your FeatureSet to the Config Map
For user to be able to add your FeatureSet to their configuration, you need to add it to the FEATURE_CONFIG_MAP in amee_utils.feature_generator.__init__.py. This is a dictionary that maps the name of the FeatureConfig to the FeatureConfig class.
Documenting your FeatureSet
The final step in the development is writing the necessary documentation. This includes:
-
Writing the documentation for the
FeatureSetso people know how to implement the config.yaml file and what features they're getting out.Example
You can use the documentation at
docs/tutorials/features/single_month.mdas an example.- Add a file for your
FeatureSetinto the API Reference. - Ensure that the
mkdocs.ymlfile is updated
Note
You can test out your documentation by running
poetry run mkdocs serveand navigating to http://127.0.0.1:8000 - Add a file for your
Pull Request Template
You can use the below as a template for your pull request:
## Description
Please include a summary of the changes you are proposing. This could include a brief overview of the Feature Set you are adding, as well as any relevant context or background information.
## Checklist
Please ensure that you have completed the following steps before submitting your pull request:
- [ ] Created a Feature Config
- [ ] Created a FeatureSet
- [ ] Written tests for your FeatureSet
- [ ] Added your FeatureSet to the FEATURE_CONFIG_MAP
- [ ] Documented your FeatureSet