No items found.

Open Source
Hopsworks
Feature Store

Quick Install - On VM, On-premises or on your own system;

bash <(curl -s https://repo.hops.works/installer/latest/hopsworks-installer.sh)

Quick Install - With Azure CLI or GCP CLI;

bash <(curl -s https://repo.hops.works/installer/latest/hopsworks-cloud-installer.sh)

A feature store is a powerful centralized storage for machine learning features; it allows organizations to repeat, re-use, improve and govern their machine learning model and data within an open ecosystem that can be connected to multiple data sources for ingestion and data science tools for serving. The feature store solves the most essential pieces of the data for ai infrastructure, bridging DataOps and MLOps.

Connect to
feature store

import hsfs
connection = hsfs.connection(host="[UUID].cloud.hopsworks.ai",
    project="[feature-store-name]",
    api_key_file="[file-with-api-key]")
fs = connection.get_feature_store()
val connection = HopsworksConnection.builder().build();
val fs = connection.getFeatureStore();

Ingest to the
feature store

import pandas as pd 
data = [['59.314781 18.070232', '2021-02-12', 0.02], ['53.3572 6.4498', '2021-02-12', 0.02]]   
df = pd.DataFrame(data, columns = ['location_id', 'date', 'rainfall']) 
fg = fs.create_feature_group(name='rain_fg',
                        version=1,
                        primary_key=['location_id'],
                        description='Rainfall at a Location on a given date',
                        online_enabled=True)
fg.save(df)
val data = Seq(("59.314781 18.070232", "2021-02-12", 0.02), ("53.3572 6.4498", "2021-02-12", 0.02))
val df = spark.createDataFrame(spark.sparkContext.parallelize(data))
val fg = (fs.createFeatureGroup()
                .name("rain_fg")
                .version(1)
                .description("Rainfall at a Location on a given date")
                .primaryKeys(Seq("location_id"))
                .build())
economyFg.save(df)

Create
Training Data

feature_join = rain_fg.select_all()
                    .join(temperature_fg.select_all())
                    .join(location_fg.select_all()))

td = fs.create_training_dataset(
                    name = "precipitation",
                    description = "Precipitation Training dataset",
                    data_format = "tfrecord",
                    splits = {"train": 0.7, "test": 0.2, "validate": 0.1},
                    version = 1)

td.save(feature_join)
val feature_join = rain_fg.selectAll()
                    .join(temperature_fg.selectAll())
                    .join(location_fg.selectAll())

val precipitation_td = (fs.createTrainingDataset(),
                        .name("precipitation"),
                        .description("Precipitation Training dataset"),
                        .dataFormat(DataFormat.TFRECORD),
                        .version(1),
                        .build()),

precipitation_td.save(feature_join)

Serve with
the feature
store

td = fs.get_training_dataset("precipitation", 1)
# dict containing the primary key name/values for the FGs in the TD
input_keys = { "location_id" : "59.314781 18.070232" }

# retrieve a single feature vector 
ordered_feature_vector = td.get_serving_vector(input_keys)
val input_keys =  new HashMap[String, Object]()
input_keys.put("location_id", "53.3572 6.4498")
precipitation_td.getServingVector(input_keys)

CORE CAPABILITIES 

The feature store allows the automation and management of feature engineering and serving at scale for stream and batch data. Hopsworks is the most advanced open source feature store with an end-to-end machine learning platform for development and operation of machine learning models.

Collaboration

Organize work in self-service teams with GDPR-compliant data storage and processing.

Faster Model Training with many GPUs

Add GPUs to your cluster and users can use them as needed - with quotas.

Scalable

Start small on a single VM and scale to clusters of 1000s of CPUs/GPUs and PBs of data.

First-Class Python

Run Jupyter notebooks as jobs or use PyCharm plugin. Install libraries with, conda/pypi.

Cost Effective Platform

Runs on commodity hardware; support for Ubuntu/Debian and Redhat/Centos.

Open Source

Use Hopsworks to build paid services on it. If, however, you modify the code, you should release it as AGPL-V3.