Machine Learning Systems

A machine learning system is a computer system that is responsible for managing the data and the programs that train and operate the machine learning models that power an AI-enabled application or service.

Four types of Machine Learning Systems

Machine learning systems (ML systems) can be categorized into four different types:

real-time interactive applications that take user input and use a model to make a prediction;
batch applications that use models to make predictions on a schedule;
stream processing applications that use models to make predictions on streaming data;
embedded/edge applications that use models and sensors in resource constrained environments.

Real-time, interactive applications differ from the other machine learning systems as they often use models as external network callable services that are hosted on standalone model serving infrastructure. Batch, stream processing, and embedded/edge machine learning systems typically embed the model as part of the system and invoke the model via a function or inter-process call.

The following are examples of the four different types of machine learning systems:

Batch ML Systems

Dashboards are built from predictions made by a batch ML system.
Predict Air Quality - take observations of air quality from sensors and use weather as features for predicting air quality. A dashboard can predict air quality by using the weather forecast (input features) to predict air quality (target).
Interactive Systems that use predictions made by a batch ML system.
Google Photos Search - when your photos are uploaded to Google, it runs a classification model to identify things and places in the photo. Those things/places are indexed against the photo, so that you can search in free-text to find matching photos. For example, if you type in “bike”, it will show you your photos that have one or more bicycles in them.

Stream Processing ML Systems

Real-time pattern matching systems that do not require user input are often stream processing ML systems.
Network Intrusion Detection - if you use stream processing to extract features about all traffic in a network, you can then use a model to predict anomalies such as network intrusion.

Real-Time ML Systems

Interactive systems that make predictions based on user input.
ChatGPT is an example of a system that takes user input (a prompt) and returns an answer in text.

Tiktok builds its personalized recommendations engine using ML and a real-time feature store that provides historical user information and context to better personalize recommendations.

Embedded or Edge ML Systems

Real-time pattern matching systems that run on resource-constrained or network detached devices.
Tesla Autopilot is an driver assist system powered by ML that uses sensors from cameras and other systems to help the ML models make predictions about what driving actions to take (steering, acceleration, braking, etc).

Offline/Online Architecture for ML Systems

Machine learningsystems are both trained and operated using cleaned and processed data (called features), created by a program called a feature pipeline. The feature pipeline writes its output feature data to a feature store that feeds data to both the training pipeline (that trains the model) and the inference pipeline. The inference pipeline makes predictions on new data that comes from the feature pipeline. Real-time, interactive ML systems also take new data as input from the user. Feature pipelines and inference pipelines are operational services - part of the operational ML system. In contrast, a ML system also has an offline component - model training. The training of models is typically not an operational part of a ML system. Training pipelines can be run on separate systems using separate resources (e.g., GPUs). Models are sometimes retrained on a schedule (e.g., once day/week/etc), but are often retrained when a new improved model becomes available, e.g., because new training data is available or the existing model’s performance has degraded and the model needed to be retrained on more recent data.