Back to the Index

Feature Groups

What are feature groups in a feature store?

A feature group is a logical table of features that provides a single API for updating feature values, two different APIs - an online and an offline API - for reading feature values. In practice, feature groups are stored in a dual database system with an online database storing the latest feature values in row-oriented table format  (e.g., OLTP databases or key-value stores) for fast feature retrieval during online inference, and the second offline database stores the historical feature values in a column-oriented store (e.g., data warehouse/lakehouse or object store). The offline API (to the offline store) is used to create training data and batch inference data. The data between the online and offline stores should be kept consistent by the feature store.

A feature group has an entity_id (or primary key), and optionally an event_time column (indicating when the features in that row were observed), and a partition_key (to layout the data in the offline store for faster query performance). Each row in the online store is uniquely identified by its entity_id (as the online store only stores the latest values for entities), but the offline store stores historical feature values for entities, so each row is uniquely identified by the (entity_id, event_time) pair. For example, for a credit card, the online store would only store the latest transaction feature data (with the credit card number as the entity_id), but the offline store would store all the historical transactions for that credit card - each transaction is uniquely identified by the credit card number and the timestamp for that transaction.

The other columns in a feature group are the features. You can select a particular feature (column) as the partition key for a feature group to determine how to layout the rows in the offline store such that you can efficiently query the data using queries with the partition_key. For example, if your partition key is the day and you have hundreds of days worth of data, with a day partition_key, you can query for a given day or a range of days, and only the data for those days will be read from disk (a push-down filter).

Does this content look outdated? If you are interested in helping us maintain this, feel free to contact us.

© Hopsworks 2024. All rights reserved. Various trademarks held by their respective owners.

Privacy Policy
Cookie Policy
Terms and Conditions