No items found.

Jim Dowling

CEO and Co-Founder

Let's keep in touch!

Subscribe to our newsletter and receive the latest product updates, upcoming events, and industry news.

More Blogs

5-minute Interview Camilo Rodriguez

Feature Pipelines in Production with Hopsworks

Build your Value Case for Feature Store Implementations

Job Scheduling & Orchestration using Hopsworks and Airflow

Build Your Own pdf.ai: Using both RAG and Fine-Tuning in one Platform

Article updated on

Hopsworks 3.1 Product Updates: Feature Store & UI Improvements

February 2, 2023

4 min

Read

Jim Dowling

CEO and Co-Founder

Hopsworks

TL;DR

Hopsworks 3.1 is now generally available. This version includes improvements in the feature store (time-series splits for training data, support for managing thousands of models), stability, and user-interface improvements.

‍Hopsworks 3.0 introduced new Python APIs to extend its support to moderate-sized data challenges, where Python and Pandas are important technologies for feature pipelines to create features, training pipelines to create models, and batch inference pipelines to produce predictions.

Time-Series Splits of Training Data

One of the challenges when creating training data for a model is splitting the training data into train, test, and validation sets. When the data is time independent (such as a static dataset that does not change over time), a random split of the training data into train/test/validation sets is appropriate. However, much Enterprise data is time-dependent - such as consumer/sales/orders that are seasonal and are affected by exogenous shocks and changes in human behavior that occur at a slower timescale. To this end, we introduce API support for time-series splits of training data from feature views. In the example below, we can see how to create train/test Pandas DataFrames split using time ranges using a feature view containing electricity prices.

X_train, X_test, y_train, y_test = electricity_price_feature_view.train_test_split(
    train_start="2022-01-01",
    train_end="2022-05-31",    
    test_start="2022-06-01", 
    test_end="2022-06-30", 
    description='Electricity price prediction training dataset H1 2022'
)

Manage Thousands of Models for Personalized AI

Companies are gaining competitive advantage by personalizing their AI, and training models for individual customers or groups of users. Many of these personalized models share the same set of features for training and inference, but differ in the data that is used to train them. For example, you may have 3 models for customers in the US, EU, and Asia. You define the same set of features for all customers, but when you want to train a model or get a batch of inference data for one region, you only want the features for customers from that one region - e.g., training data for customers in the EU.

With training data filters, you can now easily create training data filtered out to only include the desired grouping. The example below shows how to retrieve the electricity price features for the region “SE1” that we can then use to train a model to predict electricity prices for the region “SE1”. You could have similar code to retrieve training data for models for the regions “SE2”, “SE3”, and “SE4”. When you create batch inference data (get_batch_data), using the same feature view and the training dataset identifier, it will inherit the same training data filters, making it easier to implement batch inference pipelines at scale for each of your personalized models. You can scale training data filters to easily manage thousands of training datasets, enabling easier management of training data for personalized models.

# filtered training dataset creation
X_train, X_test, y_train, y_test = electricity_price_feature_view.train_test_split(
    train_start="20220101",
    train_end="20220531",    
    test_start="20220601", 
    test_end="20220630", 
    description='Electricity price prediction training dataset Jan/May 2022'
    ).filter("region"=="SE1")

# batch inference
X_features = electricity_price_feature_view.init(training_set_id=2).get_batch_data()
y_preds = model.predict(X_features)

Other Release Notes

We have deprecated our older example notebooks, and our new tutorials that now are available on Github. We modified the RBAC capabilities of the Data scientist role - they are no longer able to create new feature groups or edit existing feature store entities that they are not the creator of. Also, feature stores can now be shared using only "read-only" access rights

References

Interested for more?

🤖 Register for free on Hopsworks Serverless
📚 Get your early copy: O'Reilly's 'Building Machine Learning Systems' book
🐍 Learn all about the Python-Centric Feature Store
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

More blogs

We look at the end-to-end productionization of feature pipelines with Hopsworks, from managing code to deployment, scheduling and monitoring.

Feature Pipelines in Production with Hopsworks

In this post, we will look at how to put feature pipelines into production using Hopsworks.

Fabio Buso

A framework for developing value cases for feature store implementations through quantifying the benefits and costs associated with a feature store.

Build your Value Case for Feature Store Implementations

We explore how to develop a compelling value case that quantify the benefits and costs associated with implementing a feature store.

Rik Van Bruggen

Explore Job Scheduling and Orchestration in Hopsworks including how simple jobs can be scheduled through the Hopsworks UI by non-technical users.

Data Engineering

Job Scheduling & Orchestration using Hopsworks and Airflow

This article covers the different aspects of Job Scheduling in Hopsworks including how simple jobs can be scheduled through the Hopsworks UI by non-technical users

Ehsan Heydari

PRODUCT

RESOURCES

COMPANY

JOIN OUR MAILING LIST

Subscribe to our newsletter and receive the latest product updates, upcoming events, and industry news.

© Hopsworks 2024. All rights reserved. Various trademarks held by their respective owners.

Terms and Conditions