Back to the Index

Monolithic Machine Learning Pipeline

What is a monolithic ML pipeline?

A monolithic ML pipeline is a single program that can be run as either (1) a feature pipeline followed by a training pipeline or (2) a feature pipeline followed by a batch inference pipeline. The monolithic ML pipeline is typically parameterized to run in either TRAIN mode or INFERENCE mode. Training mode takes historical raw data, computes features from it, trains the model and then saves the model to a model registry. Inference mode takes inference data, computes features from it (using the same feature logic as in training mode), downloads the model from the model registry, and makes predictions on the input features, with the results output to some storage sink.

Monolithic ML pipelines are only possible for batch ML systems, as online systems will need a separate online inference pipeline. Their advantage over separate feature/training/inference pipelines is that there is no need for a feature store, but the disadvantage is that materialized features cannot be reused across different models. Monolithic ML pipelines are also larger, more monolithic, systems that are harder to maintain and develop.

Architecture Graph

Interested for more?

🤖 Register for free on Hopsworks Serverless
🐍 Learn all about the Python-Centric Feature Store
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

Does this content look outdated? If you are interested in helping us maintain this, feel free to contact us.

M

Auto-regressive Models

M

Backfill features

Backfill training data

Backpressure for feature stores

Batch Inference Pipeline

M

CI/CD for MLOps

Compound AI Systems

Context Window for LLMs

M

DAG Processing Model

Data Compatibility

Data Partitioning

Data Transformation

Data Type (for features)

Data Validation (for features)

Data-Centric ML

Dimensional Modeling and Feature Stores

M

Encoding (for Features)

M

Gradient Accumulation

Grouped Query Attention

M

Hallucinations in LLMs

Hyperparameter Tuning

M

Idempotent Machine Learning Pipelines

In Context Learning (ICL)

Inference Pipeline

Instruction Datasets for Fine-Tuning LLMs

M

LLM Code Interpreter

LLM Temperature

LLMs - Large Language Models

Lagged features

M

Natural Language Processing (NLP)

M

On-Demand Features

On-Demand Transformation

Online Inference Pipeline

Online-Offline Feature Skew

Online-Offline Feature Store Consistency

M

Parameter-Efficient Fine-Tuning (PEFT) of LLMs

Point-in-Time Correct Joins

Precomputed Features

Prompt Engineering

M

RLHF - Reinforcement Learning from Human Feedback

Real-Time Machine Learning

Recommender System

Representation Learning

Retrieval Augmented Generation (RAG) for LLMs

M

SQL UDF in Python

Similarity Search

Splitting Training Data

Streaming Feature Pipeline

Streaming Inference Pipeline

M

Theory-of-Mind Tasks

Time travel (for features)

Train (Training) Set

Training Pipeline

Training-Inference Skew

Two-Tower Embedding Model

Types of Machine Learning

M

M

Vector Database

Versioning (of ML Artifacts)