Real-Time Feature Serving for Personalization at Scale
Zalando is Europe's largest online fashion platform, operating in over 25 countries and serving more than 50 million customers. To power real-time personalization use cases such as recommendations, Zalando needed a scalable, low-latency, and highly available feature serving platform that could support many teams and mission-critical applications.
The Aim
As Zalando scaled its machine learning platform and customer-facing use cases, several challenges emerged:
- Feature data was fragmented across teams, leading to data silos
- Inconsistent features between training and production environments
- Limited discoverability and reusability of features across teams and departments
- Difficulty ensuring low-latency feature access for real-time customer-facing applications
- High operational overhead in maintaining feature serving infrastructure
- Risk of large blast radius during incidents due to shared infrastructure
- Limited scalability and resilience in earlier EC2-based deployments
- Manual operations and configuration drift across environments
- Challenges meeting strict availability and latency SLOs for critical use cases
Zalando needed a centralized feature platform that could provide strong isolation, high availability, and predictable performance while supporting collaboration across the organization.
Why Hopsworks?
Zalando adopted the Hopsworks Feature Store as a centralized, real-time feature platform to support both online and offline ML workloads.
With Hopsworks, Zalando was able to:
- Centralize feature storage to reduce data silos and improve consistency across training and serving
- Enable feature discovery, versioning, and reuse across multiple teams and projects
- Serve features with strict low-latency requirements for customer-facing applications
- Achieve high availability through multi-availability zone replication
- Reduce blast radius by isolating critical projects in dedicated clusters
- Manage infrastructure through APIs and Infrastructure as Code
- Transition from EC2-based deployments to Kubernetes (EKS) for better isolation and self-healing
- Leverage RonDB for scalable, low-latency online feature serving
- Integrate event-driven pipelines using Kafka for feature freshness
- Consolidate metadata, access control, and governance across federated feature stores
Results
Low-Latency Feature Serving at Scale
- Sub–10 millisecond latency for latency-sensitive personalization use cases
- Predictable performance under high request volumes
- Real-time feature freshness for more accurate recommendations
Improved Reliability and Availability
- High-availability feature serving with multi–availability zone replication
- Reduced blast radius through project-level cluster isolation
- Self-healing services on Kubernetes, minimizing operational incidents
Centralized Feature Governance
- Single source of truth for features across teams and departments
- Centralized metadata, versioning, and access control
- Easier collaboration and reuse of high-quality, production-ready features
Faster Scaling with Lower Operational Overhead
- Automated horizontal scaling based on response time, request rate, and CPU usage
- Seamless capacity expansion without manual intervention
- Infrastructure managed via APIs and Git-based configuration
- Elimination of manual EC2 operations such as SSH access and disk resizing
- Faster rollouts and upgrades using rolling Kubernetes deployments
