Scheduled upgrade on April 4, 08:00 UTC

Kindly note that during the maintenance window, app.hopsworks.ai will not be accessible.

April 4, 2025

App Status

Back to Blog

Jim Dowling

CEO and Co-Founder

Let's keep in touch!

Subscribe to our newsletter and receive the latest product updates, upcoming events, and industry news.

More Blogs

Hopsworks AI Lakehouse Now Supports NVIDIA NIM Microservices

How we secure your data with Hopsworks

Migrating from AWS to a European Cloud - How We Cut Costs by 62%

The 10 Fallacies of MLOps

Hopsworks AI Lakehouse: The Power of Integrated MLOps Components

Article updated on

Towards better AI-models in the betting industry with a Feature Store

February 20, 2020

6 min

Read

Jim Dowling

CEO and Co-Founder

Hopsworks

Feature Store

TL;DR

Solving challenges from regulators, cybercrime and your customer base can be hard. One of the ways of responding and acting to these challenges is by using the power of data and machine learning, for example, to identify fraud, improve user engagement, and ensure responsible gambling by identifying at-risk players.

The success of Machine Learning algorithms in a broad range of areas has led to an increasing demand for ML-platform/solutions in the gaming industry. Many operators are currently focused on managing data pipelines and evaluating or developing machine learning platforms to optimize their operations and stay competitive.

ML is core to what AI-native companies like Uber, Airbnb, and Twitter do for creating new products and redefining customer experience standards. The crucial first step in their ML-process is feature engineering, and it often is the most laborious activity in the model building lifecycle. These AI-native companies almost all have built feature stores to optimize their feature engineering processes across multiple teams and models.

We helped Paddy Power mature their digital transformation by implementing the innovative and essential feature store concept, a central repository of features (input data used to train ML models) in a store that act as an enterprise-wide marketplace of features for different teams with different remits. The feature store enables the reuse of common features and uses case-specific ML-features, for predictive betting models for different sports books, anti-fraud and AML (anti-money laundering) models and player management and responsible gambling models where features are reused across different models.

Why is having a feature store valuable?

The concept of a feature store was introduced by Uber in 2017 as part of its internal Michelangelo platform for ML. The feature store is a central place to store curated features within an organization. So what is a feature, exactly? A feature is a measurable property of some data-sample. It could, for example, be the number of customer transactions over a period of time (hour, day, week), the recent performance of a horse in horse-racing, or the average number of deposits and exits within the last hour. Features can be extracted directly from files and database tables or can be derived values, computed from one or more data sources.

Features are the fuel for AI systems, as we use them to train machine learning models so that you can make predictions using new feature values that your model has never seen before.

A feature store enables the reusability of features across your gaming operations, as existing features are visible to all potential users (data engineers, data scientists, machine learning engineers, business analysts, etc). Shared features can then be used to develop models for:

Algorithmic marketing,
Anti Money Laundering,
Responsible Gambling,
Optimization of betting outcome predictions etc.

The feature store supports feature enrichment, discovery, ranking, lineage and lifecycle management for features.

In both training and serving models, the feature store plays a valuable role. During model training, the feature store is used to create training data in the file format of choice for the Data Scientists. There is no need to write and run new data pipelines to make feature data available in .tfrecord or .npy or .csv files. Data scientists can interactively generate train/test data in the file format of their choice on the storage platform of their choice (s3, HDFS, etc).

When models are being used, the feature store provides batch applications access to large volumes of feature data, while for online model serving, the feature store provides low latency access to feature data for online applications.

The old way of working

Without a feature store, organizations have ad-hoc scripts and programs for feature engineering with limited sharing of features either within teams or between teams. Features can be rewritten many times, in different ways, by different developers. Feature pipelines also need to be re-written when new training file formats appear (petastorm, for example), and enterprises have little insight into which features are being used in the organization and adding most value. Developers are also required to develop infrastructure to ensure that offline and online feature data is kept consistent, a non-trivial task.

New ways of working

Feature engineering pipelines are written and operated, by Data Engineers, that take data from backend systems, and transform and validate it before filling the feature store with feature data. Data scientists are now freed from heavy feature engineering and dedicate more time to developing higher quality models by selecting features and backfilling train/test datasets that they then use to train models. Data scientists are responsible for training and validating their models before they are deployed for either batch or online applications. ML engineers, who operate models in production, can also lookup feature data in real-time for applications

Our Offering

The Hopsworks Feature Store enables teams to work effectively together, sharing outputs and assets at all stages in machine learning (ML) pipelines. In effect, the Feature Store:

Acts as an API between Data Engineering and Data Science, enabling improved collaboration between Data Engineers, who engineer the features, with Data Scientists, who use the features to train models.
Enables features to be registered, discovered, and used as part of ML pipelines, thus making it easier to transform and validate the training data that is fed into machine learning systems.
Meets traditional Enterprise Computing requirements with support for access control, feature versioning, governance (e.g., terms of use), model interpretability, privacy, and auditing.
Is both horizontally scalable and highly available.
Fits seamlessly into both development environments and ML pipelines – whether you are in the Cloud or on-premises, with integrations for Databricks, AWS Sagemaker, and Kubeflow.

References

Interested for more?

🤖 Register for free on Hopsworks Serverless
🌐 Read about the open, disaggregated AI Lakehouse stack
📚 Get your early copy: O'Reilly's 'Building Machine Learning Systems' book
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

More blogs

A study on how to optimize PIT joins in Apache Spark and how utilizing the Early Stop Sort-Merge join can increase functionality on Hopsworks In the future

A Spark Join Operator for Point-in-Time Correct Joins

In this blog post we showcase the results of a study that examined point-in-time join optimization using Apache Spark in Hopsworks.

Axel Pettersson

Towards better AI-models in the betting industry with a Feature Store

Introducing the feature store which is a new data science tool for building and deploying better AI models in the gambling and casino business.

Jim Dowling

How to build your own Feature Store

We have many conversations with companies and organizations who are deciding between building their own feature store and buying one. We thought we would share our experience of building one.

Jim Dowling