The freshness of features can be critical in applications where the decision(s) made by the ML model require that the features used to make the decision are not older than a certain amount of time. For example, if an online credit card fraud detection ML system only has features that are 1 hour old or older, then it will massively underperform during the first hour that a given credit card starts to be used fraudulently.
Feature freshness for training data is when the data used to train a model is not the most recent feature data available. You can improve feature freshness for training data by recreating or updating the training dataset with the latest available feature data.
For feature pipelines, you can switch from batch feature pipelines to streaming feature pipelines. This will reduce the time between when data used to compute features becomes available and when it is written to a feature store, from where it can be used in model inference. If you have the possibility to use an on-demand feature instead of a precomputed feature, you may be able to improve that feature’s freshness by switching to an on-demand feature. If you are using a high latency online feature store to retrieve precomputed features, you can improve feature freshness by switching to a lower-latency online feature store.
In batch systems, you have to add together the amount of time that has passed since the batch of features has been created until when the row is processed by the model in the batch inference program.
In a streaming feature pipeline, the time between when the streaming event or batch of events has been created to when the streaming feature pipeline writes the updated feature value to the online feature store.