Hopsworks is a Data-Intensive AI platform that manages the full AI lifecycle for MLOps, built around the industry leading Feature Store. Data ingestion tasks, for example data validation and ingestion, tend to be long-lived and typically run as part of a greater orchestrated data pipeline. Therefore, it is necessary to establish a mechanism where alerts can be customized and sent for different events that are triggered as part of the ingestion pipeline. This tutorial will go through the necessary steps to set up alerts in the Hopsworks Feature Store for feature validation and ingestion.
Hopsworks brings new alerting capabilities that enables users to monitor jobs and feature group validations. As alerting capabilities are relatively new to Hopsworks, we keep working on adding alerting support for other services. Currently the scope of alerts in Hopsworks is twofold; notify users about the changes in the status of jobs; notify users about the feature validation status of data being inserted into a feature group of the feature store or even of feature validations performed post-insertion.
There are two ways alerts can be configured for jobs and feature group validations:
In this blog post we will go through the steps necessary to trigger alerts for jobs and feature group validations. Alerts can be sent via Slack, email, and PagerDuty. For the purposes of this blog Slack will be used as the alert receiver, but the steps described here apply to any of the methods previously mentioned.
Alerts can be set up at a project-level or at a cluster-level, meaning. Cluster wide alerts can only be configured by a platform administrator, a user with the HOPS_ADMIN role assigned, and project-level alerts on the other hand can be configured by any member of a project.
To follow this tutorial you should have an instance of Hopsworks version 2.4 or above running on https://hopsworks.ai. You can register for free, without providing credit card information, and receive USD 300 worth of free credits to get started. The only thing you need to do is to connect your cloud account.
Below we provide a step-by-step guide showing how to set up Hopsworks to trigger alerts for a PySpark feature engineering job that prior to inserting data into the feature group, uses the feature validation SDK in hsfs to ensure the correctness of the newly arrived data. Both the feature validation and the execution of the job will trigger alerts that are subsequently sent to different engineering groups, one that is responsible for monitoring jobs and another one that is responsible for the feature data itself.
In particular, by the end of this example the following events will have occurred:
To send alerts via Slack you first need to configure the global Slack webhook from the cluster settings page, as shown in the animation below.
In addition to the webhook, you can add global receivers (channels). Global receivers will be available to all users of the cluster.
To send alerts via email or PagerDuty you will need to add their respective configurations. If you prefer, it is also possible to add configurations and receivers by directly editing the alert manager configuration as shown in the image below.
Detailed information on how to configure the alert manager global settings is available here.
After configuring the global Slack webhook you will be able to add project specific receivers by going to your project setting’s alerts section. Give the receiver a name that can identify the team that will receive the alert. In this example we will call it ml-team and will send alerts to the #ml-team channel and the user @admin in Slack.
We also created a receiver called op-team that will receive alerts about jobs and feature group validations.
Once a receiver is created we can go-ahead and create the alerts that will be triggered when feature validation fails and when the validation job finishes.
The validation alert will be created in the project settings and will be triggered on any validation event, in this demo a failed validation, in the project. When creating an alert we need to specify the trigger, receiver and severity. Here we will choose a trigger on data validation fail and the receiver will be the one created in the previous section (severity can be set to any value; info, warning or critical).
Now that we have created the validation alert we are ready to create the job that will create the feature group and populate it with fresh validated data. When creating the job we choose advanced to enable us to add alerts for this particular job. We want an alert to be triggered when the job finishes and send the alert to the op-team with severity info.
Finally run the job to see if alerts are sent to both teams. If everything works as expected #ml-team will receive an alert on data validation failure and #op-team will receive an alert when the job finishes (on job completion). The video below walks us through the scenario described above.