Hopsworks 4.3 GPU Management, Data Contracts and AI Assis...

Hopsworks 4.3 is now generally available. This new release introduces a new pipeline builder as part of the Hopsworks platform which allows our enterprise customers to significantly speed up the building of Feature, Training and Inference pipelines using external data. Also in this release is improved enterprise scheduling (building on the 4.2 release), Hopsworks now supports cohort based scheduling allowing better separation of job workloads on GPU infrastructure related to training, hyperparameter tuning and/or LLM fine-tuning. Finally, with the 4.3 release, Hopsworks can now better segregate the data of an enterprise allowing an Identity Provider to better control who gets access to specific resources (e.g., deployed ML models).

GPU Sharing at Enterprise Scale

Enterprises across a number of verticals including finance, insurance, entertainment and retail have made significant investments in compute infrastructure over the past number of years to handle the significant workloads when building cutting edge machine learning solutions. One of the challenges with this infrastructure is how best to schedule these varied workloads on this infrastructure. With the release of 4.3, Hopsworks provides support for a number of different ways to schedule workloads based on cohorts. This approach allows workloads that meet certain criteria to be grouped together taking into account the network topology of the infrastructure, data access restrictions and scheduling deadlines.

Column-Level Access Control

With the release of 4.3, Hopsworks has introduced new functionality allowing enterprises to better control who can access sensitive data when building machine learning systems. This is ongoing work which will be built on in future releases to better allow Hopsworks enterprise customers to handle their data (e.g. GDPR related information) based on their users requests.

Enhanced Data Contracts for Feature Groups

In Hopsworks, we already have support for custom metadata for feature groups enabling you to specify the data contract for a feature group to clients - such as data validation rules with Great Expectations, and the ability to define your own metadata for data retention, update frequency, and indications for when data is delayed. We now have added support for schema validation for updates to feature groups via the DataFrame API, where we now enforce schema constraints (such as the maximum length of strings (e.g., varchar(100)) in Python and Spark clients.

[Private Preview] Brewer - LLM-Assisted AI Developer/Programmer

To facilitate the creation of FTI pipelines, Hopsworks has created and released Brewer, a new type of AI assistant designed to create programs and engineering pipelines -- as part of 4.3. One of the challenges facing Enterprises wishing to build state of the art machine learning systems is understanding all the different aspects of such solutions following MLOps best practices. With Brewer, a Hopsworks Enterprise customer can easily generate feature pipelines based on their externally stored data as the first step in building such solutions. By setting up these pipelines in minutes and subsequently passing this data to training and inference pipelines allows Hopsworks Enterprise customers to enable the full power of Generative AI using their externally stored data.

Packages Update

With the Hopsworks 4.3 release, a number of different packages have been updated including KServe to v0.14.0, kyverno to 1.13.0 and pandas to 2.2.x. With the increased emphasis on GenAI and LLM support, the following libraries have been added to the Hopsworks pipeline base image: (1) Flash Attention (2.7.4), (2) Transformers (4.51.3) and LangChain (0.3.25).