Back to the Index

Training Pipeline

What is a Training Pipeline?‍

A training pipeline is a series of steps or processes that takes input features and labels (for supervised ML algorithms), and produces a model as output. A training pipeline typically reads training data from a feature store, performs model-dependent transformations, trains the model, and evaluates the model before the model is saved to a model registry. If model evaluation is complex, it can also be performed after the model has been saved in a model registry.

‍

‍

Some of the steps involved in training a model include the:

selection of the features and the range of data to be used to train the model,
splitting the training data into train/test/validation sets,
encoding/scaling feature data before it is fed into the model for training,
selection of a model architecture (e.g., tree-based, feedforward DNN, transformer)
identification of good hyperparameters for the combination of prediction problem, training data, and model architecture,
fitting the training data to the model (i.e., model training),
model evaluation - validation/testing of the model's performance and checks for any model bias,
registration of the trained model with a model registry.

Using a feature store in the training pipeline helps to achieve consistency across different training runs and ensures that the features used for training are of high quality and reproducible.

Interested for more?

🤖 Register for free on Hopsworks Serverless
🐍 Learn all about the Python-Centric Feature Store
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

Does this content look outdated? If you are interested in helping us maintain this, feel free to contact us.

T

Auto-regressive Models

T

Backfill features

Backfill training data

Backpressure for feature stores

Batch Inference Pipeline

T

CI/CD for MLOps

Compound AI Systems

Context Window for LLMs

T

DAG Processing Model

Data Compatibility

Data Partitioning

Data Transformation

Data Type (for features)

Data Validation (for features)

Data-Centric ML

Dimensional Modeling and Feature Stores

T

Encoding (for Features)

T

Gradient Accumulation

Grouped Query Attention

T

Hallucinations in LLMs

Hyperparameter Tuning

T

Idempotent Machine Learning Pipelines

In Context Learning (ICL)

Inference Pipeline

Instruction Datasets for Fine-Tuning LLMs

T

LLM Code Interpreter

LLM Temperature

LLMs - Large Language Models

Lagged features

T

Natural Language Processing (NLP)

T

On-Demand Features

On-Demand Transformation

Online Inference Pipeline

Online-Offline Feature Skew

Online-Offline Feature Store Consistency

T

Parameter-Efficient Fine-Tuning (PEFT) of LLMs

Point-in-Time Correct Joins

Precomputed Features

Prompt Engineering

T

RLHF - Reinforcement Learning from Human Feedback

Real-Time Machine Learning

Recommender System

Representation Learning

Retrieval Augmented Generation (RAG) for LLMs

T

SQL UDF in Python

Similarity Search

Splitting Training Data

Streaming Feature Pipeline

Streaming Inference Pipeline

T

Theory-of-Mind Tasks

Time travel (for features)

Train (Training) Set

Training-Inference Skew

Two-Tower Embedding Model

Types of Machine Learning

T

T

Vector Database

Versioning (of ML Artifacts)