← Back to Dictionary

Schema

What are schemas in feature stores?

A schema defines the shape, order, and type of data stored in ML artifacts, including:  feature groups, feature views, training datasets, and models. The schema enables the validation of the shape, order, and type of data that is either read from or input to a ML artifact. Some ML artifacts, such as feature groups, feature views, and models can support schema versioning. A schema version change indicates that the new version has a breaking schema change compared to the previous version.

Why is a schema important for ML artifacts?

  1. Enforcing data contracts: A schema defines the structure and data types of features within a feature group, ensuring that the data conforms to a specific format. This enforces a data contract between producers and consumers of the features, which is crucial for maintaining consistency and reliability in the machine learning pipeline.
  2. Promoting best practices: By defining a schema, data engineers and data scientists are encouraged to follow best practices in data modeling and management. 
  3. Versioning: Schemas enable versioning of feature groups. If the structure or data types of a feature group change, a new schema version can be created to accommodate the changes without disrupting existing pipelines or models.
  4. Error detection: With a schema in place, errors in data shape, order, or type can be detected early in the pipeline, making it easier to identify and fix issues before they propagate downstream.