Scheduled upgrade on April 4, 08:00 UTC
ContactLoginGithub MarkGithub Marklinkedin logoTwitter icon
Untitled UI logotextLogo
Product
Hopsworks Enterprise
Full edition of Hopsworks, high levels of SLAs and support.
AI Lakehouse
Unified platform for data and AI with real-time capabilities.
Feature Store
Centralize, share, and reuse ML features at scale.
MLOps Management
Orchestrate and monitor your
end-to-end ML workflows.
GPU Management
Maximize your GPU utilization for LLMs and Deep Learning.
Hopsworks On-Premises
Manage everything securely within your data center.
Performance & High Availability
Highest performance requirements in the industry.
Model Registry
Track, version, and deploy models with full lineage.
Integrations
Leverage your existing data sources and tools.
Examples
Get up and running on new features and techniques.
FAQ
All you need to know about Hopsworks.
Other capabilities
Read about our extended platform capabilities.
Solutions
For your Team
Technical Stakeholders
Machine Learning Engineers
Data Engineers
Data Scientists
DevOps
Architects
Non-Technical Stakeholders
Project Managers
For your Industry
Defense and Law Enforcement
FSI
Hopsworks for Small Teams
Online Retail & E-commerce
Public Sector
Research and Healthcare
iGaming
All Solutions
Use Cases
Generative AI
Real-time Fraud Detection
Hopsworks Medical Copilot
Customers
Explore how our customers leverage Hopsworks.
Pricing
Blog
PricingBlog
Resources
MLOps Dictionary
Comprehensive terminology guide for ML solutions.
Documentation
Detailed information to help you effectively utilize Hopsworks.
Research Papers
Discover how our research is driving innovation.
Community
Join our community and get all your questions answered.
Events
Online & Offline sessions and workshops.
Academy
Everything about ML Systems, and the Hopsworks platform.
Feature Store Comparison
In-depth comparisons of feature stores highlighting key features.
FAQ: EU AI Act
A complete guide to The EU AI Act.
Company
About us
Learn more about our team.
News
The latest industry news, updates and info.
Security & Compliance
Robust security and compliance with industry standards.
Book
Benchmarks
arroiw back
Back to Blog

Data Engineering

Categories

//
5-minute Interviews
//
AI
//
Benchmark
//
Data Science
//
Feature Store
//
MLOps
//
SIGMOD 2024

Other Categories

SIGMOD 2024
AI
5-minute Interviews
Benchmark
Data Science
Data Engineering
MLOps
Feature Store
March 11, 2025
10min
Read

Migrating from AWS to a European Cloud - How We Cut Costs by 62%

This post describes how we successfully migrated our serverless offering from AWS US-East to OVHCloud North America, reducing our monthly spend from $8,000 to $3,000 with no loss in service quality.

Jim Dowling
February 25, 2025
20 min
Read

The 10 Fallacies of MLOps

MLOps fallacies can slow AI deployment and add complexity. This blog breaks down 10 common misconceptions, their impact, and how to avoid them to build scalable, production-ready AI systems efficient.

Jim Dowling
January 3, 2025
23 min
Read

Amazon FSx for NetApp ONTAP interoperability test in a Hopsworks 4.x Deployment

By following this tutorial, you can evaluate the interoperability between Hopsworks 4.x and Amazon FSx for NetApp ONTAP

Javier Cabrera
December 9, 2024
16 min
Read

Hopsworks PKI: The Unseen Hero

In this article we explore how our Public Key Infrastructure has changed over the years coming to its current form, a Kubernetes first-class citizen.

Antonios Kouzoupis
October 21, 2024
12 min
Read

Migrating Hopsworks to Kubernetes

Nearly a year ago, the Hopsworks team embarked on a journey to migrate its infrastructure to Kubernetes. In this article we describe three main pillars of our Kubernetes migration.

Javier Cabrera
September 2, 2024
30 min
Read

Introducing the AI Lakehouse

We describe the capabilities that need to be added to Lakehouse to make it an AI Lakehouse that can support building and operating AI-enabled batch and real-time applications as well LLM applications.

Jim Dowling
August 6, 2024
13 min
Read

Reproducible Data for the AI Lakehouse

We present how Hopsworks leverages its time-travel capabilities for feature groups to support reproducible creation of training datasets using metadata.

Jim Dowling
July 25, 2024
8 min
Read

RonDB: A Real-Time Database for Real-Time AI Systems

Learn more about how Hopsworks (RonDB) outperforms AWS Sagemaker and GCP Vertex in latency for real-time AI databases, based on a peer-reviewed SIGMOD 2024 benchmark.

Mikael Ronström
July 23, 2024
9 min
Read

From Lakehouse to AI Lakehouse with a Python-Native Query Engine

Read how Hopsworks generates temporal queries from Python code, and how a native query engine built on Arrow can massively outperform JDBC/ODBC APIs.

Jim Dowling
July 1, 2024
30 min
Read

The Taxonomy for Data Transformations in AI Systems

This article introduces a taxonomy for data transformations in AI applications that is fundamental for any AI system that wants to reuse feature data in more than one model.

Manu Joseph
June 25, 2024
25 min
Read

Modularity and Composability for AI Systems with AI Pipelines and Shared Storage

We present a unified software architecture for batch, real-time, and LLM AI systems that is based on a shared storage layer and a decomposition of machine learning pipelines.

Jim Dowling
May 13, 2024
10 min
Read

Building a Cheque Fraud Detection and Explanation AI System using a fine-tuned LLM

The third edition of the LLM Makerspace dived into an example of an LLM system for detecting check fraud.

Jim Dowling
April 17, 2024
10 min
Read

Job Scheduling & Orchestration using Hopsworks and Airflow

This article covers the different aspects of Job Scheduling in Hopsworks including how simple jobs can be scheduled through the Hopsworks UI by non-technical users

Ehsan Heydari
April 11, 2024
6 min
Read

Build Your Own Private PDF Search Tool

A summary from our LLM Makerspace event where we built our own PDF Search Tool using RAG and fine-tuning in one platform. Follow along the journey to build a LLM application from scratch.

Jim Dowling
April 10, 2024
17 min
Read

Build Vs Buy: For Machine Learning/AI Feature Stores

On the decision of building versus buying a feature store there are strategic and technical components to consider as it impacts both cost and technological debt.

Rik Van Bruggen
April 2, 2024
7 min
Read

Unlocking the Power of Function Calling with LLMs

This is a summary of our latest LLM Makerspace event where we pulled back the curtain on a exciting paradigm in AI – function calling with LLMs.

Jim Dowling
January 18, 2024
20 min
Read

Common Error Messages in Pandas

We go through the most common errors messages in Pandas and offer solutions to these errors as well as provide efficiency tips for Pandas code.

Haziqa Sajid
November 30, 2023
20 min
Read

Feature Engineering with DBT for Data Warehouses

Read about the advantages of using DBT for data warehouses and how it's positioned as a preferred solution for many data analytics and engineering teams.

Kais Laribi
November 6, 2023
20 min
Read

Pandas2 and Polars for Feature Engineering

We review Python libraries, such as Pandas, Pandas2 and Polars, for Feature Engineering, evaluate their performance and explore how they power machine learning use cases.

Haziqa Sajid
October 9, 2023
13 min
Read

Machine Learning Embeddings as Features for Models

Delve into the profound implications of machine learning embeddings, their diverse applications, and their crucial role in reshaping the way we interact with data.

Prithivee Ramalingam
September 13, 2023
25 min
Read

From MLOps to ML Systems with Feature/Training/Inference Pipelines

We explain a new framework for ML systems as three independent ML pipelines: feature pipelines, training pipelines, and inference pipelines, creating a unified MLOps architecture.

Jim Dowling
September 4, 2023
18 min
Read

Feature Engineering with Apache Airflow

Unlock the power of Apache Airflow in the context of feature engineering. We will delve into building a feature pipeline using Airflow, focusing on two tasks: feature binning and aggregations.

Prithivee Ramalingam
August 23, 2023
13 min
Read

Automated Feature Engineering with FeatureTools

An ML model’s ability to learn and read data patterns largely depend on feature quality. With frameworks such as FeatureTools ML practitioners can automate the feature engineering process.

Haziqa Sajid
August 9, 2023
13 min
Read

Faster reading from the Lakehouse to Python with DuckDB/ArrowFlight

In this article, we outline how we leveraged ArrowFlight with DuckDB to build a new service that massively improves the performance of Python clients reading from lakehouse data in the Feature Store

Till Döhmen
June 21, 2023
8 min
Read

Building Feature Pipelines with Apache Flink

Find out how to use Flink to compute real-time features and make them available to online models within seconds using Hopsworks.

Fabio Buso
June 20, 2023
10 min
Read

Feature Engineering for Categorical Features with Pandas

Explore the power of feature engineering for categorical features using Pandas. Learn essential techniques for handling categorical variables, and creating new features.

Prithivee Ramalingam
September 21, 2022
10 min
Read

Data Validation for Enterprise AI: Using Great Expectations with Hopsworks

Learn more about how Hopsworks stores both data and validation artifacts, enabling easy monitoring on the Feature Group UI page.

Victor Jouffrey
September 15, 2022
9 min
Read

How to use external data stores as an offline feature store in Hopsworks with Connector API

In this blog, we introduce Hopsworks Connector API that is used to mount a table in an external data source as an external feature group in Hopsworks.

Dhananjay Mukhedkar
August 23, 2022
20 min
Read

From Pandas to Features to Models to Predictions - A deep dive into the Hopsworks APIs

Learn how the Hopsworks feature store APIs work and what it takes to go from a Pandas DataFrame to features used by models for both training and inference.

Fabio Buso
June 30, 2022
10 min
Read

A Spark Join Operator for Point-in-Time Correct Joins

In this blog post we showcase the results of a study that examined point-in-time join optimization using Apache Spark in Hopsworks.

Axel Pettersson
May 27, 2022
15 min
Read

Feature Types for Machine Learning

Programmers know data types, but what is a feature type to a programmer new to machine learning, given no mainstream programming language has native support for them?

Jim Dowling
April 26, 2022
17 min
Read

Testing feature logic, transformations, and feature pipelines with pytest

Operational machine learning requires the offline and online testing of both features and models. In this article, we show you how to design, build, and run test for features.

Jim Dowling
November 23, 2021
3 min
Read

Show me the code; how we linked notebooks to features

We are introducing a new feature in Hopsworks UI - feature code preview - ability to view the notebook used to create a Feature Group or Training Dataset.

Jim Dowling
October 8, 2021
10 min
Read

End-to-end Deep Learning Pipelines with Earth Observation Data in Hopsworks

In this blog post we demonstrate how to build such a pipeline with real-world data in order to develop an iceberg classification model.

Theofilos Kakantousis
July 12, 2021
10 min
Read

AI Software Architecture for Copernicus Data with Hopsworks

Hopsworks brings support for scale-out AI with the ExtremeEarth project which focuses on the most concerning issues of food security and sea mapping.

Theofilos Kakantousis
June 4, 2021
6 min
Read

How to build ML models with fastai and Jupyter in Hopsworks

This tutorial gives an overview of how to work with Jupyter on the platform and train a state-of-the-art ML model using the fastai python library.

Robin Andersson
November 19, 2020
13 min
Read

HopsFS file system: 100X Times Faster than AWS S3

Many developers believe S3 is the "end of file system history". It is impossible to build a file/object storage system on AWS that can compete with S3 on cost. But what if you could build on top of S3

October 14, 2019
15 min
Read

Hello Asynchronous Search for PySpark

Read how Hopsworks supports easy hyperparameter optimization (both synchronous and asynchronous search), distributed training using PySpark.

Moritz Meister
October 22, 2018
20 min
Read

Goodbye Horovod, Hello TensorFlow

Hopsworks is replacing Horovod with Keras/TensorFlow’s new CollectiveAllReduceStrategy, a part of Keras/TensorFlow Estimator framework.

Jim Dowling
Untitled UI logotextLogo
The AI Lakehouse
🇸🇪 🇪🇺
Product
Hopsworks Enterprise
Capabilities
Integrations
Examples
Pricing
App Status
FAQ
Solutions
Industry & Team Solutions
Generative AI
Real-time Fraud Detection
Hopsworks Medical Copilot
Customers
Resources
Blog
MLOps Dictionary
Events
Documentation
Academy
Research Papers
Feature Store Comparison
Community
FAQ: EU AI Act
Company
About Us
News
Security & Compliance
Contact Us
Join our newsletter
Receive the latest product updates, upcoming events, and industry news.

© Hopsworks 2025. All rights reserved. Various trademarks held by their respective owners.

Privacy Policy
Cookie Policy
Terms and Conditions