Hopsworks Feature Store with

Spark (EMR, Databricks, Cloudera, HDInsight, DataProc)

If you have large volumes of data as input to your feature or batch pipelines, then Spark can be used to compute features as DataFrames and write them directly to Hopsworks Feature Store. Spark can write features in both batch and streaming modes.

Hopsworks Integrations

Spark can be used to implement batch feature pipelines, streaming feature pipelines, batch inference pipelines, and streaming inference pipelines for features that are written to Hopsworks Feature Store.

Other integrations

Parquet (Athena, S3, ADLS, GCS)
AWS Sagemaker
Dagster

© Hopsworks 2024. All rights reserved. Various trademarks held by their respective owners.

Privacy Policy
Cookie Policy
Terms and Conditions