No items found.
Article updated on

Hopsworks 3.0 - Connecting Python to the Modern Data Stack

July 20, 2022
7 min
Read
Jim Dowling
Jim Dowlinglink to linkedin
CEO and Co-Founder
Hopsworks
Fabio Buso
Fabio Busolink to linkedin
VP Engineering
Hopsworks
Lex Avstreikh
Lex Avstreikhlink to linkedin
GTM & Marketing
Hopswokrks

TL;DR

Hopsworks 3.0 is a new release focused on best-in-class Python support, Feature Views that unify Offline and Online read APIs to the Feature Store, Great Expectations support, KServe support and a Model Registry for KServe, an improved User Interface, and support for GCP as a managed platform. Hopsworks 3.0 is designed to provide a first-class Python centric compute experience from any Python environment, and to this end we are providing early access to serverless Hopsworks, where you bring your own Python environment, and Hopsworks will manage your features and models for you.

Today we are proud to announce the general availability of Hopsworks 3.0. This is the latest realization of our vision to connect all teams on all clouds and enable them to develop and operate AI-enabled products using Hopsworks.

This version of Hopsworks bridges the gap between Python and the modern data stack. Python is the language of choice for Data Science, and, through transpilation, we bring the power of SQL to our Python SDK, seamlessly transferring data to and from Data Warehouses that can now be virtual offline feature stores in Hopsworks. In particular, we have introduced improvements in Hopsworks that enable Python to transparently harness the scale and power of the modern data stack via the feature store. Now, you can get started with Hopsworks by just installing the Hopsworks client SDK in Python, and immediately accelerate bringing your machine learning applications to production.

Hopsworks Serverless

Hopsworks is now available as a serverless platform. Now you can build an AI-enabled product faster than ever before by just installing the Hopsworks client SDK in your Python environment and registering on app.hopsworks.ai. Hopsworks can manage your features, training data, and models - always accessible over the network. With Hopsworks Serverless, you run your pipelines on your own Python environment (Laptop, Colab, Github Action, etc), but we provide storage for and access to your features and training data. You can also manage and deploy your models on our hosted KServe. Hopsworks Serverless is currently early access with a free forever tier - so you can build real systems today with confidence.

New Feature View APIs

Historically Training Datasets have been the Hopsworks way of reading feature data from the feature store. Working closely with customers and users we realized that the training dataset concept did not quite align with the usage patterns data scientists were expecting. 

We went back to the whiteboard to improve the user experience for data scientists. Feature View is the result of that work. At a high level Feature Views represent the set of features needed by a given model. They store metadata about which transformation functions are applied to which features.

Data scientists can then use Feature Views to generate training datasets, potentially over different time windows, ensuring the model schema is the same when the model is retrained.

The same Feature View can then be used by analytical models to generate batch data to score - and by operational models to generate feature vectors for real time predictions. The Feature View will ensure the consistency of both the schema and transformations being applied between model training, batch scoring, and real time predictions.

Improved write APIs

As the goal of Hopsworks 3.0 is to make data scientists feel more Python centric, we have worked to improve the experience when writing to the feature store.

Hopsworks has always supported writing Pandas dataframe from a Python interpreter, however the experience was not optimal. 

With Hopsworks 3.0 we streamlined the ingestion process, reduced the write amplification and delivered a more interactive experience data scientists expect.

All the improvements were done under the hood, making existing feature engineering pipelines fully compatible with Hopsworks 3.0.

Great Expectations support

At its core the feature store needs to provide reliable features to data scientists to build and productionize models. It is extremely important that the data being written to the feature store is clean and of high quality to avoid garbage in, garbage out situations. Great expectations is the most popular library for data used by thousands of data scientists for production ready data validation. 

Hopsworks 3.0 comes with first class support for Great Expectations. Users are able to define expectation suites or reuse existing ones. They are able to assign an expectation suite to a feature group. Hopsworks takes care of validating incoming data against the defined expectation suite, generating reports and triggering alerts. 

OpenSearch with k-NN 

Many production machine learning systems use both a Feature Store and a Vector Database, such as real-time personalized recommendations and search. Now, instead of having to deploy Hopsworks as a Feature Store and a separate Vector Database, you only need Hopsworks. We now provide OpenSearch, including its k-NN plugin (a Vector Database). We provide API support for managing the complexity of discovering and securely connecting to a k-NN index for writing embeddings or similarity search with embeddings. OpenSearch indexes are private to projects, enabling self-service access control and CI/CD pipelines within a shared cluster.

KServe and a Model Registry for KServe

Hopsworks now supports KServe for model serving, with new low latency access to model deployments using Istio, and new feature and prediction logging to Kafka. We have built a model registry for KServe with unique support for versioned models and their versioned artifacts - prediction and transformer scripts. Both KServe and its model registry are private to projects, enabling self-service access control and CI/CD pipelines within a shared cluster. We also introduce a new Python SDK for managing models and deployments.

Managed Hopsworks on Google Cloud Platform

With Hopsworks 3.0, we also announce the availability of our managed service on the Google Cloud Platform. GCP joins AWS and Azure as supported cloud providers to run managed Hopsworks. Managed Hopsworks on GCP allows users to deploy and manage Hopsworks instances within their own VPC. Leveraging the capabilities of the Hopsworks feature store while keeping control of their data and metadata.

Start building now on Hopsworks 3.0

You can start building with Hopsworks right now for free. Register an account on https://app.hopsworks.ai to get started. Feel free to use the tutorials we put together (https://github.com/logicalclocks/hopsworks-tutorials) as a template to build your first prediction service.

References

Recommended for you