No items found.

Jim Dowling

CEO and Co-Founder

Fabio Buso

VP Engineering

Lex Avstreikh

Head of Strategy

Let's keep in touch!

Subscribe to our newsletter and receive the latest product updates, upcoming events, and industry news.

More Blogs

Build your Value Case for Feature Store Implementations

Job Scheduling & Orchestration using Hopsworks and Airflow

Build Your Own pdf.ai: Using both RAG and Fine-Tuning in one Platform

Build Vs Buy: For Machine Learning/AI Feature Stores

5-minute interview Abi Aryan

Article updated on

Hopsworks 3.0 - Connecting Python to the Modern Data Stack

July 20, 2022

7 min

Read

Jim Dowling

CEO and Co-Founder

Hopsworks

Fabio Buso

VP Engineering

Hopsworks

Lex Avstreikh

Head of Strategy

Hopsworks

TL;DR

Hopsworks 3.0 is a new release focused on best-in-class Python support, Feature Views that unify Offline and Online read APIs to the Feature Store, Great Expectations support, KServe support and a Model Registry for KServe, an improved User Interface, and support for GCP as a managed platform. Hopsworks 3.0 is designed to provide a first-class Python centric compute experience from any Python environment, and to this end we are providing early access to serverless Hopsworks, where you bring your own Python environment, and Hopsworks will manage your features and models for you.

Today we are proud to announce the general availability of Hopsworks 3.0. This is the latest realization of our vision to connect all teams on all clouds and enable them to develop and operate AI-enabled products using Hopsworks.

This version of Hopsworks bridges the gap between Python and the modern data stack. Python is the language of choice for Data Science, and, through transpilation, we bring the power of SQL to our Python SDK, seamlessly transferring data to and from Data Warehouses that can now be virtual offline feature stores in Hopsworks. In particular, we have introduced improvements in Hopsworks that enable Python to transparently harness the scale and power of the modern data stack via the feature store. Now, you can get started with Hopsworks by just installing the Hopsworks client SDK in Python, and immediately accelerate bringing your machine learning applications to production.

Hopsworks Serverless

Hopsworks is now available as a serverless platform. Now you can build an AI-enabled product faster than ever before by just installing the Hopsworks client SDK in your Python environment and registering on app.hopsworks.ai. Hopsworks can manage your features, training data, and models - always accessible over the network. With Hopsworks Serverless, you run your pipelines on your own Python environment (Laptop, Colab, Github Action, etc), but we provide storage for and access to your features and training data. You can also manage and deploy your models on our hosted KServe. Hopsworks Serverless is currently early access with a free forever tier - so you can build real systems today with confidence.

New Feature View APIs

Historically Training Datasets have been the Hopsworks way of reading feature data from the feature store. Working closely with customers and users we realized that the training dataset concept did not quite align with the usage patterns data scientists were expecting.

We went back to the whiteboard to improve the user experience for data scientists. Feature View is the result of that work. At a high level Feature Views represent the set of features needed by a given model. They store metadata about which transformation functions are applied to which features.

Data scientists can then use Feature Views to generate training datasets, potentially over different time windows, ensuring the model schema is the same when the model is retrained.

The same Feature View can then be used by analytical models to generate batch data to score - and by operational models to generate feature vectors for real time predictions. The Feature View will ensure the consistency of both the schema and transformations being applied between model training, batch scoring, and real time predictions.

Improved write APIs

As the goal of Hopsworks 3.0 is to make data scientists feel more Python centric, we have worked to improve the experience when writing to the feature store.

Hopsworks has always supported writing Pandas dataframe from a Python interpreter, however the experience was not optimal.

With Hopsworks 3.0 we streamlined the ingestion process, reduced the write amplification and delivered a more interactive experience data scientists expect.

All the improvements were done under the hood, making existing feature engineering pipelines fully compatible with Hopsworks 3.0.

Great Expectations support

At its core the feature store needs to provide reliable features to data scientists to build and productionize models. It is extremely important that the data being written to the feature store is clean and of high quality to avoid garbage in, garbage out situations. Great expectations is the most popular library for data used by thousands of data scientists for production ready data validation.

Hopsworks 3.0 comes with first class support for Great Expectations. Users are able to define expectation suites or reuse existing ones. They are able to assign an expectation suite to a feature group. Hopsworks takes care of validating incoming data against the defined expectation suite, generating reports and triggering alerts.

OpenSearch with k-NN

Many production machine learning systems use both a Feature Store and a Vector Database, such as real-time personalized recommendations and search. Now, instead of having to deploy Hopsworks as a Feature Store and a separate Vector Database, you only need Hopsworks. We now provide OpenSearch, including its k-NN plugin (a Vector Database). We provide API support for managing the complexity of discovering and securely connecting to a k-NN index for writing embeddings or similarity search with embeddings. OpenSearch indexes are private to projects, enabling self-service access control and CI/CD pipelines within a shared cluster.

KServe and a Model Registry for KServe

Hopsworks now supports KServe for model serving, with new low latency access to model deployments using Istio, and new feature and prediction logging to Kafka. We have built a model registry for KServe with unique support for versioned models and their versioned artifacts - prediction and transformer scripts. Both KServe and its model registry are private to projects, enabling self-service access control and CI/CD pipelines within a shared cluster. We also introduce a new Python SDK for managing models and deployments.

Managed Hopsworks on Google Cloud Platform

With Hopsworks 3.0, we also announce the availability of our managed service on the Google Cloud Platform. GCP joins AWS and Azure as supported cloud providers to run managed Hopsworks. Managed Hopsworks on GCP allows users to deploy and manage Hopsworks instances within their own VPC. Leveraging the capabilities of the Hopsworks feature store while keeping control of their data and metadata.

Start building now on Hopsworks 3.0

You can start building with Hopsworks right now for free. Register an account on https://app.hopsworks.ai to get started. Feel free to use the tutorials we put together (https://github.com/logicalclocks/hopsworks-tutorials) as a template to build your first prediction service.

References

Interested for more?

🤖 Register for free on Hopsworks Serverless
📚 Get your early copy: O'Reilly's 'Building Machine Learning Systems' book
🐍 Learn all about the Python-Centric Feature Store
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

More blogs

Explore Job Scheduling and Orchestration in Hopsworks including how simple jobs can be scheduled through the Hopsworks UI by non-technical users.

Data Engineering

Job Scheduling & Orchestration using Hopsworks and Airflow

This article covers the different aspects of Job Scheduling in Hopsworks including how simple jobs can be scheduled through the Hopsworks UI by non-technical users

Ehsan Heydari

During our latest LLM Makerspace event we demonstrated how to build your own pdf.ai LLM application using RAG and fine-tuning in one platform.

Data Engineering

Build Your Own pdf.ai: Using both RAG and Fine-Tuning in one Platform

A summary from our LLM Makerspace event where we built our own pdf.ai using RAG and fine-tuning in one platform. Follow along the journey to build a LLM application from scratch.

Jim Dowling

When deciding on whether to build versus buy a feature store platform for AI/ML, there are technological, strategic and innovative components to consider.

Build Vs Buy: For Machine Learning/AI Feature Stores

On the decision of building versus buying a feature store there are strategic and technical components to consider as it impacts both cost and technological debt.

Rik Van Bruggen

PRODUCT

RESOURCES

COMPANY

JOIN OUR MAILING LIST

Subscribe to our newsletter and receive the latest product updates, upcoming events, and industry news.

© Hopsworks 2024. All rights reserved. Various trademarks held by their respective owners.

Terms and Conditions