Back to the Index

Online Inference Pipeline

What is an online inference pipeline?

An online inference pipeline is a program that runs in a model deployment and returns predictions to a client using a model, downloaded and cached from a model registry, and features that are either on-demand or precomputed and retrieved from the feature store.

What are the key steps in an online inference pipeline?

An online application sends a prediction request to the model deployment that contains an inference pipeline that processes the input request, computes any on-demand features, retrieves any precomputed features from a feature store, applies any model-dependent transformations to the on-demand and precomputed features and builds a feature vector from them, before making a prediction on the model with that feature vector, and finally post-processes the prediction before it sends the results back to the client application.

architecture graph

Interested for more?

🤖 Register for free on Hopsworks Serverless
📚 Get your early copy: O'Reilly's 'Building Machine Learning Systems' book
🐍 Learn all about the Python-Centric Feature Store
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

Does this content look outdated? If you are interested in helping us maintain this, feel free to contact us.

O

Auto-regressive Models

O

Backfill features

Backfill training data

Backpressure for feature stores

Batch Inference Pipeline

O

CI/CD for MLOps

Compound AI Systems

Context Window for LLMs

O

DAG Processing Model

Data Compatibility

Data Partitioning

Data Transformation

Data Type (for features)

Data Validation (for features)

Data-Centric ML

Dimensional Modeling and Feature Stores

O

Encoding (for Features)

O

Gradient Accumulation

O

Hallucinations in LLMs

Hyperparameter Tuning

O

Idempotent Machine Learning Pipelines

In Context Learning (ICL)

Inference Pipeline

Instruction Datasets for Fine-Tuning LLMs

O

LLM Code Interpreter

LLMs - Large Language Models

Lagged features

O

Natural Language Processing (NLP)

O

On-Demand Features

On-Demand Transformation

Online-Offline Feature Skew

Online-Offline Feature Store Consistency

O

Parameter-Efficient Fine-Tuning (PEFT) of LLMs

Point-in-Time Correct Joins

Precomputed Features

Prompt Engineering

O

RLHF - Reinforcement Learning from Human Feedback

Real-Time Machine Learning

Representation Learning

Retrieval Augmented Generation (RAG) for LLMs

O

SQL UDF in Python

Similarity Search

Splitting Training Data

Streaming Feature Pipeline

Streaming Inference Pipeline

O

Theory-of-Mind Tasks

Time travel (for features)

Train (Training) Set

Training Pipeline

Training-Inference Skew

Two-Tower Embedding Model

Types of Machine Learning

O

O

Vector Database

Versioning (of ML Artifacts)

PRODUCT

RESOURCES

COMPANY

JOIN OUR MAILING LIST

Subscribe to our newsletter and receive the latest product updates, upcoming events, and industry news.

© Hopsworks 2024. All rights reserved. Various trademarks held by their respective owners.

Terms and Conditions