Back to the Index

Data Contract

What is a data contract?

A data contract provides schema level guarantees for a feature group and includes metadata such as how/where a feature may be used, the expected freshness of the feature, when it last updated, and optionally the service level agreement (SLA) for lookups.

Why are data contracts important?

Data contracts are important because they ensure consistency and compatibility of feature data across the different stages of a machine learning system (feature pipelines, training pipelines, inference pipelines). By providing schema level guarantees, data contracts help to prevent issues with data quality, reliability, and versioning, ensuring that downstream systems and models can depend on the data being provided. Data contracts can also help to enforce governance and compliance requirements for feature data usage.

How to implement an example data contract

Create a feature group containing customer purchase data, which includes features such as product name, price, and purchase date. The feature group should also include a schema version and metadata such as the description, owner, last updated when and by whom. Use custom metadata (e.g., schematized tags) to define governance and compliance requirements for the feature group. Then, for a model staged for deployment, you can use provenance to automatically find the features used by the model, and then check the governance and compliance requirements by looking up the custom metadata for all those features.

Does this content look outdated? If you are interested in helping us maintain this, feel free to contact us.

© Hopsworks 2024. All rights reserved. Various trademarks held by their respective owners.

Privacy Policy
Cookie Policy
Terms and Conditions