Hopsworks 3.5 is now generally available. This version includes improvements to the feature view APIs, better management of Airflow Dags when using Airflow embedded in Hopsworks, support for newer Databricks runtimes as well as several dependency upgrades.
Feature views helper columns for training and inference
Users can now mark a set of columns as training or inference helpers when creating a feature view. Training helpers are columns which are not part of the model input but they are useful at training time. Similarly, inference helpers are columns which are also not part of the model schema, but they are useful at inference time, e.g. to compute on-demand features.
When retrieving feature data for model training, users can now select whether or not they want to retrieve the primary keys and event time, as well as the training helper columns.When retrieving feature data for batch inference (using the get_batch_data method), users can select whether or not they want to include the primary keys and event time, as well as the inference helpers.Finally, when retrieving feature data for real time inference, users can invoke the new get_inference_helpers method to retrieve the values of the inference helper columns for the specific set of keys needed.
Airflow Dags management improvements
Hopsworks 3.5 improves the management of Airflow dags when using Airflow deployed within Hopsworks. Starting from this release, Ariflow leverages HopsFS-Mount, a new fuse driver to make HopsFS available as a local Linux FileSystem, to store the dag files. This allows Airflow dags to be stored in a dataset within each Hopsworks project and be available to Airflow for execution.
Additionally this release adds support for project-level permissions for Airflow dags. This allows users within a project to collaborate on writing, operating and monitoring of Airflow dags and their executions.
Hopsworks 3.5 now supports Databricks runtime 12.2 which leverages Spark 3.3.x.Starting from this release, we also provide a Databricks Bundle and Terraform example for users to use to configure their Databricks Jobs/Clusters to read and write from the Hopsworks feature store.
Federated queries with Hopsworks Feature Query Service
Hopsworks' new feature query engine, built using ArrowFlight and DuckDB, can now be used to query features in external feature groups stored on BigQuery or Snowflake. This new capability allows users to use the new query engine to seamlessly join features stored in any combination of the following offline stores: Hopsworks, Snowflake, and BigQuery.
This release includes several upgrades of services in Hopsworks. Most notably, Hopsworks 3.5 Python clients now support Python 3.11. Great expectations has also been updated to version 0.15.12 which improves compatibility of the Hopsworks library with other libraries depending on the Jinja2 package. Apache Hudi has also been upgraded to the latest bugfix release of the LTS release 0.12.3.