Real-Time Feature Serving for Personalization at Scale

Zalando is Europe's largest online fashion platform, operating in over 25 countries and serving more than 50 million customers. To power real-time personalization use cases such as recommendations, Zalando needed a scalable, low-latency, and highly available feature serving platform that could support many teams and mission-critical applications.

The Aim

As Zalando scaled its machine learning platform and customer-facing use cases, several challenges emerged:

Feature data was fragmented across teams, leading to data silos
Inconsistent features between training and production environments
Limited discoverability and reusability of features across teams and departments
Difficulty ensuring low-latency feature access for real-time customer-facing applications
High operational overhead in maintaining feature serving infrastructure
Risk of large blast radius during incidents due to shared infrastructure
Limited scalability and resilience in earlier EC2-based deployments
Manual operations and configuration drift across environments
Challenges meeting strict availability and latency SLOs for critical use cases

Zalando needed a centralized feature platform that could provide strong isolation, high availability, and predictable performance while supporting collaboration across the organization.

Why Hopsworks?

Zalando adopted the Hopsworks Feature Store as a centralized, real-time feature platform to support both online and offline ML workloads.

With Hopsworks, Zalando was able to:

Centralize feature storage to reduce data silos and improve consistency across training and serving
Enable feature discovery, versioning, and reuse across multiple teams and projects
Serve features with strict low-latency requirements for customer-facing applications
Achieve high availability through multi-availability zone replication
Reduce blast radius by isolating critical projects in dedicated clusters
Manage infrastructure through APIs and Infrastructure as Code
Transition from EC2-based deployments to Kubernetes (EKS) for better isolation and self-healing
Leverage RonDB for scalable, low-latency online feature serving
Integrate event-driven pipelines using Kafka for feature freshness
Consolidate metadata, access control, and governance across federated feature stores

Results

Low-Latency Feature Serving at Scale

Sub–10 millisecond latency for latency-sensitive personalization use cases
Predictable performance under high request volumes
Real-time feature freshness for more accurate recommendations

Improved Reliability and Availability

High-availability feature serving with multi–availability zone replication
Reduced blast radius through project-level cluster isolation
Self-healing services on Kubernetes, minimizing operational incidents

Centralized Feature Governance

Single source of truth for features across teams and departments
Centralized metadata, versioning, and access control
Easier collaboration and reuse of high-quality, production-ready features

Faster Scaling with Lower Operational Overhead

Automated horizontal scaling based on response time, request rate, and CPU usage
Seamless capacity expansion without manual intervention
Infrastructure managed via APIs and Git-based configuration
Elimination of manual EC2 operations such as SSH access and disk resizing
Faster rollouts and upgrades using rolling Kubernetes deployments

Zalando

How Zalando serves real-time features at scale with Hopsworks