No items found.

The big dictionary of MLOps & LLMOps

Comprehensive Terminology Guide for Building and Managing ML Solutions.

This dictionary/glossary covers terms from MLOps, LLMOps, data engineering, and Feature Stores, but does not cover terms from the broader ML algorithms and frameworks space.

MLOps is the roadmap you follow to go from training models in notebooks to building production ML systems. It is a set of principles and practices that encompass the entire ML System lifecycle, from ideation to data management, feature creation, model training, inference, observability, and operations.

MLOps is based on three principles: observability, automated testing, and versioning of ML artifacts. Observability for ML systems refers to the ability to gain insights into the behavior and performance of production machine learning models. Automated testing will enable you to build ML systems with confidence that tests will catch any potential bugs in your data or code. Versioning will enable you to safely operate ML systems by supporting upgrades and rollback without affecting system operations. MLOps should help tighten your ML development iteration loop by enabling you to roll out fixes and improvements to ML systems faster. Finally, the Feature Store is often called the data layer for MLOps. It acts as a data platform that enables ML pipelines to be decomposed into smaller more manageable pipelines for feature engineering, model training, and model inference.

LLMOps is MLOps for Large Language Models (LLMS) and it is a set of practices for the operationalization of applications that use LLMs to provide intelligent language-based services. This involves the management of fine-tuning LLMs, prompt engineering, integration with or external vector databases and/or feature stores for in-context learning, and infrastructure for training, deploying, and serving LLMs.


An AI Pipeline is a program that takes input and produces one or more ML artifacts as output.
In many time-series datasets, past values influence the current values. This is true for both stock market prices to words in a sentence or a longer piece of text.
AutoML stands for Automated Machine Learning and it describes the process of automating various tasks in model training pipelines.


Backfilling is the process of recomputing datasets from raw, historical data.
Backfilling training data from a feature store means creating a point-in-time consistent snapshot of feature data that will be used to train one or more models.
The backpressure pattern consists of a feedback mechanism that allows consumers to inform upstream components when they are ready to handle new messages.
A batch inference pipeline is a program that takes as input a batch of data and a model, and outputs predictions that are typically written to some sink.


Continuous Integration (CI) is the practice of continuously merging code changes from multiple developers into a shared repository.
Compound AI systems represent a profound shift in AI system development, moving away from large standalone models towards more dynamic and collaborative compound systems.
The context window of LLMs is the number of tokens the model can take as input when generating responses.


DAG Processing Model

A DAG (directed acyclic graph) processing model is a method of representing the dependencies between tasks in a workflow or pipeline.

Data Compatibility

Data compatibility often refers to feature consistency, where the schema of features used in feature pipelines, training pipelines, and inference pipelines is compatible.

Data Contract

A data contract provides schema level guarantees for a feature group or feature view and includes metadata, such as how/where a feature may be used.

Data Lakehouse

A Data Lakehouse is a modern data architecture that combines the benefits of both data lakes and data warehouses.

Data Leakage

Data leakage occurs when data that should be outside of the training dataset is explicitly or implicitly used to train a model. It can result in incorrect estimation of a trained model’s performance.

Data Modeling

Data modeling describes how tables in a data warehouse are structured to create a simplified and easy-to-understand layout that enables efficient, ad-hoc querying and analysis of large datasets.

Data Partitioning

When you create a feature group, you can select one or more features (columns) as the partition key, storing data with the same partition key values in the same directory.

Data Pipelines

Data pipelines are orchestrated programs that move data from one system to another while also performing transformations on the data.

Data Quality

High data quality for ML refers to data that can be used to train high performance models.

Data Transformation

A data transformation is a function that is applied to some input data that changes the data in such a way that the data is easier to consume by downstream applications or users.

Data Type (for features)

A feature value is a data value. In programming languages, a feature is represented as a primitive data type, such as an int, string, array, or boolean.

Data Validation (for features)

ML model training or inference can crash if there are problems with input data. Incorrect or out-of-distribution data can introduce the problem of skew in the inference or training data.

Data-Centric ML

Data-centric ML describes a set of practices for iteratively improving the quality of and set of available feature data for models.

Dimensional Modeling and Feature Stores

In data warehousing, dimensional modeling is a data modeling technique that identifies entities and then decomposes your data into “facts” and “dimensions” related to those entities.


Downstream indicates that the user/client/application is a consumer of the dataset or feature group.



ELT stands for Extract, Load, and Transform of data.


ETL stands for Extract, Transform, and Load of data


An embedding is a compressed representation of data such as text or images as continuous vectors in a lower-dimensional space.

Encoding (for Features)

Feature values can be encoded for data compatibility or to improve model performance.


In a feature store, an entity is represented as rows in a feature group, where each row corresponds to a single instance of the object or concept.



A feature is a measurable property of some data-sample that is used as input for a ML model for training and serving.

Feature Data

Feature data is simply the data that is passed as input to machine learning (ML) models.

Feature Engineering

Feature engineering is the process of selecting, creating, and transforming raw data into features that can be used as input to machine learning algorithms.

Feature Freshness

Feature freshness refers to the time lag between when the date required to compute a feature becomes available to when the feature is available for use in an inference pipeline.

Feature Function

A feature function is a function that computes one or more feature values from input data.

Feature Groups

A feature group is a logical table of features that provides a single API for updating feature values, two different APIs - an online and an offline API - for reading feature values.

Feature Logic

Feature logic is the series of steps that transform input data into the unencoded data value that represents the feature in the feature store

Feature Monitoring

Feature monitoring involves continuously monitoring the performance of the features used as model inputs in inference pipelines to identify potential problems in input feature values.

Feature Pipeline

A feature pipeline is a program that orchestrates the execution of a dataflow graph of feature functions where the computed features are written to one or more feature groups.

Feature Platform

A feature platform is a feature store that also provides support for a domain-specific language (DSL) to define feature logic and feature pipelines.

Feature Reuse

Features are computed in a feature pipeline and stored in the feature store. Features are reused if the same feature is used in more than one model.

Feature Selection

Feature selection is the process of finding existing features, in potentially different feature groups, and joining them together along with the label(s) to define a set of features.

Feature Service

A feature service is a feature view that is implemented as a network service that provides both an online and offline API for retrieving feature vectors and batches of feature values, respectively.

Feature Store

A feature store is a data platform that supports the development and operation of machine learning systems by managing the storage and efficient querying of feature data.

Feature Type

A feature type defines the set of valid encodings (model-dependent transformations) that can be performed on a feature value.

Feature Value

A feature value is a measurement (or value) of a feature at a given point in time.

Feature Vector

A feature vector is a row of feature values. A training sample for a model includes a feature vector and the label(s).

Feature View

A feature view is a selection of features (and labels) from one or more feature groups.


Data filtering is an operation on a dataset (such as a DataFrame) that defines which data to extract or remove from the dataset.

Fine-Tuning LLMs

The fine-tuning of a ML model is when you take a base model with frozen weights, add some new layers on top of the frozen layers, and train the new layers using your own training data.

Flash Attention

Flash Attention is a method to improve the efficiency of transformer models, in particular large language models (LLMs), helping reduce both model training time and inference latency.

Function Calling with LLMs

Function Calling refers to the ability of a LLM to impute, from the user prompt, the correct function to execute from a set of available functions and the correct parameters to pass to that function.


Generative AI

Generative AI generally refers to models and techniques that generate new data samples by learning the underlying data distribution.

Gradient Accumulation

Gradient Accumulation is a technique used to improve memory efficiency and stabilize training in neural networks by accumulating gradients over multiple batches before updating the model parameters.

Grouped Query Attention

Grouped Query Attention emerges as an innovative extension of traditional attention mechanisms, aiming to address several challenges associated with processing long sequences efficiently


Hallucinations in LLMs

Hallucinations are responses by LLMs that appear to be correct but are actually false or not based on the input given.


In training pipelines, a hyperparameter is a parameter that influences the performance of model training but the hyperparameter itself is not updated during model training.

Hyperparameter Tuning

Hyperparameter tuning involves training multiple models each with different hyperparameter values to find good values for hyperparameters that optimize model performance.


Idempotent Machine Learning Pipelines

An idempotent operation produces the same result no matter how many times you execute it.

In Context Learning (ICL)

In-context learning (ICL) is a specific method of prompt engineering where demonstrations of the task are provided to the model as part of the prompt without any additional training.

Inference Data

Inference data is the input feature values that are the input to a trained model that outputs a prediction.

Inference Logs

Inference logs are the input and output of inference pipelines.

Inference Pipeline

An inference pipeline is a program that takes input data, optionally transforms that data, then makes predictions on that input data using a model.

Instruction Datasets for Fine-Tuning LLMs

Instruction datasets are used to fine-tune LLMs.


LLM Code Interpreter

LLM code interpreters, translate natural language queries to generate code in a programming language, such as Python, that is then run on behalf of the user.

LLM Temperature

At its core, LLM temperature controls the balance between playing more safely and exploring new possibilities - exploration versus exploitation in the model's output.


LLMOps is MLOps for Large Language Models and it is a set of practices for the operationalization of applications that use LLMs to provide intelligent language-based services.

LLMs - Large Language Models

LLMs stands for Large Language Models.

Lagged features

Lagged features are a feature engineering technique used to capture the temporal dependencies and patterns in time series data.


LangChain is an open source framework, for Python or Javascript, designed to simplify the writing of applications using LLMs.

Latent Space

Latent space is the representation of compressed data, where compressed data is data encoded using fewer bits than the original representation.



ML stands for Machine Learning, which is a subfield of artificial intelligence (AI) that focuses on the development of algorithms and statistical models.

ML Artifacts (ML Assets)

ML artifacts are outputs of ML pipelines that are needed for execution of subsequent pipelines or ML applications.


Machine learning operations (MLOps) describes processes for automated testing of ML pipelines and ML artifact versioning that helps improve developer productivity.


A MVPS is a Minimal Viable Prediction Service.

Machine Learning Infrastructure

Machine learning infrastructure typically refers to the underlying framework, systems, and resources required to support the development, deployment, and operation of software applications

Machine Learning Logs

Machine learning logs serve as valuable records that document the entire process.

Machine Learning Observability

Machine Learning Observability involves closely monitoring and understanding how machine learning models perform once they're deployed into real-world environments.

Machine Learning Pipeline

A Machine Learning Pipeline is a program that takes input and produces one or more ML artifacts as output.

Machine Learning Systems

Machine Learning Systems can be categorized into four different types: interactive, batch, stream processing, and embedded/edge systems.

Model Architecture

A model architecture is the choice of a machine learning algorithm along with the underlying structure or design of the machine learning model.

Model Bias

Model bias refers to the presence of systematic errors in a model that can cause it to consistently make incorrect predictions.

Model Deployment

A model deployment enables clients to perform inference requests on the model over a network.

Model Development

Model development is the process of building and training a machine learning model using training data.

Model Evaluation (Model Validation)

Model evaluation (or model validation) is the process of assessing the performance of a trained ML model on a (holdout) dataset.

Model Governance

Model governance is the process for managing ML models to ensure they are secure, ethical, trustworthy, explainable, and comply with relevant regulations

Model Inference

Model inference (or machine learning inference) is when a model makes predictions on new, unseen input data (inference data) and produces predictions as output that are consumed by a user or service.

Model Interpretability

Model interpretability (also known as explainable AI) is the process by which a ML model's predictions can be explained and understood by humans.

Model Monitoring

Model monitoring involves continuously monitoring the performance of predictions made by models to identify potential problems.

Model Performance

Model performance in machine learning (ML) is a measurement of how accurate predictions or classifications a model makes on new, unseen data.

Model Quantization

Model quantization can reduce the memory footprint and computation requirements of deep neural network models.

Model Registry

A model registry is a version control system for models that provides APIs to store and retrieve models and model-related artifacts.

Model Serving

With Model Serving you take a trained ML model and make it accessible for real-world applications via a REST or gRPC API.

Model Training

Model training in MLOps happens as part of a model training pipeline.

Model-Centric ML

‍Model-centric ML is an approach to machine learning that focuses on iteratively improving model architecture and hyperparameters to enhance model performance.

Model-Dependent Transformations

A model-dependent transformation is a transformation of a feature that is specific to one model, and is consistently applied in training and inference pipelines.

Model-Independent Transformations

Model-independent data transformations produce features that can potentially be reused in training or inference by one or more models.

Monolithic Machine Learning Pipeline

A monolithic ML pipeline is a single program that can be run as either a feature pipeline followed by a training pipeline or a feature pipeline followed by a batch inference pipeline.


Natural Language Processing (NLP)

‍NLP stands for Natural Language Processing.


Offline Store

The offline store in a feature store stores the historical values of features, enabling efficient and scalable access to large volumes of historical feature data.

On-Demand Features

If a feature is used in an online inference pipeline and it is created using data only available at request-time, then it is an on-demand feature.

On-Demand Transformation

An on-demand transformation is a feature function that is used to compute an on-demand feature.

Online Inference Pipeline

An online inference pipeline is a program that runs in a model deployment and returns predictions to a client using a model, downloaded and cached from a model registry.

Online Store

The online store is a row-oriented database or key-value store that provides low-latency lookups for precomputed feature values using one or more entity IDs (or primary keys).

Online-Offline Feature Skew

Feature skew is when there are significant differences between the feature logic executed in an offline ML pipeline and the feature logic executed in the corresponding online inference pipeline.

Online-Offline Feature Store Consistency

Features that are stored in both the online and offline stores should be consistent. A replication protocol with consistency guarantees that the feature data is kept in sync.


The orchestration of ML pipelines is crucial to making ML pipelines run without human intervention, and run reliably, even in the event of hardware or software errors.


Platform - KServe

KServe is an open-source model serving platform


PagedAttention is an innovative technique that addresses the significant memory challenges faced by serving LLMs.

Pandas UDF

Pandas UDFs (User-Defined Functions) are functions that allow users to perform feature engineering (or any custom transformations) on a Pandas DataFrame using PySpark.

Parameter-Efficient Fine-Tuning (PEFT) of LLMs

Parameter-Efficient Fine-Tuning (PEFT) enables you to fine-tune a small subset of parameters in a pre-trained LLM.

Point-in-Time Correct Joins

A point-in-time correct join is an ASOF LEFT JOIN

Precomputed Features

A precomputed feature is a feature that has been created by a feature pipeline and is stored in a feature store.

Prompt Engineering

Prompt engineering involves designing natural language queries (prompts) to produce desired results from a LLM.

Prompt Store

In the context of LLMs, a prompt store is a system for logging and managing the interactions between users and LLMs.

Prompt Tuning

Prompt tuning (PT) is a parameter-efficient adaptation method for LLMs that adds a small number of tunable embeddings to an otherwise frozen model.

Python UDF

A Python UDF (user-defined function) in ML, is a function written by a user, typically to implement a feature function.


RLHF - Reinforcement Learning from Human Feedback

RLHF mitigates some of the problems of training on toxic data and on LLMs producing hallucinations.

Real-Time Machine Learning

Real-time ML reference to ML systems where decisions or predictions must be produced with minimal, predictable latency.

Representation Learning

Representation Learning, defined as a set of techniques that allow a system to discover the representations needed for feature detection or classification from raw data.

Retrieval Augmented Generation (RAG) for LLMs

Retrieval-augmented generation for large language models (LLMs) fetches data from an external datastore (outside the foundation LLM) at inference time and enriches the prompt with this data.

RoPE Scaling

LLMs rely on Rotary Position Embeddings (RoPE) to understand the relative position of words within a sequence.


SQL UDF in Python

A SQL UDF (User-Defined Function) is a custom function that extends the capabilities of SQL by allowing users to implement complex logic and transformations that are not available with built-in SQL.

Sample Packing

Sample Packing is a technique used in ML to efficiently process variable-length sequences.


A schema defines the shape, order, and type of data stored in ML artifacts, including: feature groups, feature views, training datasets,and models.

Similarity Search

Vector similarity search (or similarity search for embeddings) finds the “top K” most similar vectors to a query vector in a vector database.


In machine learning, skew refers to an imbalance in the distribution of the label (target variable) in a training dataset.

Splitting Training Data

When you train a model, you would like your model to generalize and perform well on new, unseen data. You don’t want your model to overfit to training data.

Streaming Feature Pipeline

‍A streaming feature pipeline is a program that continuously processes incoming data in real-time, extracting and computing features and writing those features to a feature store.

Streaming Inference Pipeline

A streaming inference pipeline is a streaming application that makes real-time, non-interactive predictions triggered by the arrival of an event and outputs predictions to some sink.


Test Set

The test set is a portion (or partition) of the available training data that is “held back” and not used during model training.

Theory-of-Mind Tasks

Theory of mind (ToM) is considered the ability to impute unobservable mental states in other sentient beings (or agents).

Time travel (for features)

Time travel for features refers to the ability to access historical versions of feature values at previous points in time.

Train (Training) Set

‍The train (or training) set is the portion of the training data that is used to train a machine learning model.

Training Data

Training data refers to the data set that is used to train and evaluate a ML model.

Training Pipeline

A training pipeline is a series of steps or processes that takes input features and labels (for supervised ML algorithms), and produces a model as output.

Training-Inference Skew

Training-inference skew is when there are (even slightly) different implementations of a transformation between the training and inference pipelines.


A transformation is a function that is applied to some input data and produces processed data as output.

Two-Tower Embedding Model

The two-tower (or twin-tower) embedding model connects embeddings in two different modalities by placing both modalities in the same vector space.

Types of Machine Learning

ML models are trained to solve different business prediction problems like classification (predict what class an example falls into), regression (predict a number), and language modeling.



Upstream indicates that the user/client/application is a producer of data to a given dataset or feature group (that itself is downstream).


Validation Set

The validation set is a subset of the training data used to evaluate the performance of a machine learning model during hyperparameter tuning and model selection.

Vector Database

A vector database for machine learning (ML) is a database that stores, manages, and provides semantic query support for embeddings (high-dimensional vectors).

Versioning (of ML Artifacts)

Versioning of models, features, feature groups, feature views, and training datasets enables the management of dependencies between ML artifacts.


vLLM stands for Virtual Large Language Model and addresses a critical bottleneck in LLM deployment: inefficient inferencing and serving.

© Hopsworks 2024. All rights reserved. Various trademarks held by their respective owners.

Privacy Policy
Cookie Policy
Terms and Conditions