We review Python libraries, such as Pandas, Pandas2 and Polars, for Feature Engineering, evaluate their performance and explore how they power machine learning use cases.
Delve into the profound implications of machine learning embeddings, their diverse applications, and their crucial role in reshaping the way we interact with data.
We explain a new framework for ML systems as three independent ML pipelines: feature pipelines, training pipelines, and inference pipelines, creating a unified MLOps architecture.
Unlock the power of Apache Airflow in the context of feature engineering. We will delve into building a feature pipeline using Airflow, focusing on two tasks: feature binning and aggregations.
An ML model’s ability to learn and read data patterns largely depend on feature quality. With frameworks such as FeatureTools ML practitioners can automate the feature engineering process.
In this article, we outline how we leveraged ArrowFlight with DuckDB to build a new service that massively improves the performance of Python clients reading from lakehouse data in the Feature Store
Find out how to use Flink to compute real-time features and make them available to online models within seconds using Hopsworks.
Explore the power of feature engineering for categorical features using Pandas. Learn essential techniques for handling categorical variables, and creating new features.
In this blog, we discuss the state-of-the-art in data management and machine learning pipelines (within the wider field of MLOps) and present the first open-source feature store, Hopsworks.
Learn more about how Hopsworks stores both data and validation artifacts, enabling easy monitoring on the Feature Group UI page.
In this blog, we introduce Hopsworks Connector API that is used to mount a table in an external data source as an external feature group in Hopsworks.
Learn how the Hopsworks feature store APIs work and what it takes to go from a Pandas DataFrame to features used by models for both training and inference.
In this blog post we showcase the results of a study that examined point-in-time join optimization using Apache Spark in Hopsworks.
Programmers know data types, but what is a feature type to a programmer new to machine learning, given no mainstream programming language has native support for them?
Many developers believe S3 is the "end of file system history". It is impossible to build a file/object storage system on AWS that can compete with S3 on cost. But what if you could build on top of S3