Have you ever worked really hard on training an awesome model just to have everything break in production because of a change in ETL logic in a source system? At its core the feature store needs to provide reliable features to data scientists to build and productionize models. So how can we ensure data written to the feature store is clean and of high quality to avoid garbage in, garbage out situations? Great expectations is the most popular library for data validation used by many thousands of data scientists. Hopsworks 3.0 comes integrated with Great Expectations.
In this webinar we will explain the core concepts of Great Expectations and how we made them available on Hopsworks to be used within your feature pipelines. Users are able to define expectation suites or reuse their existing ones. They are able to assign an expectation suite to a feature group and Hopsworks takes care of validating incoming data against the defined expectation suite, generating reports and triggering alerts.