A model deployment enables clients to perform inference requests on the model over a network. Typically, a model deployment is loaded to model-serving infrastructure, such as KServe, from a model registry. A model deployment only refers to online models. Offline models make predictions as part of batch inference pipelines.
After an online machine learning model has been trained on a dataset, it needs to be deployed so that it can be used in the real world to make predictions on new data. For online models, this involves converting the model into a format that can be deployed to model serving infrastructure from where it can handle requests from clients over the network in real-time.
Not all machine learning projects require model deployment. For instance, if your project involves offline analysis, batch processing, or one-time predictions, deploying the model may not be necessary as a batch inference pipeline may be good enough.
However, if your project involves real-time predictions, continuous data streams, or requires online interaction with users or other systems, then deploying your model becomes essential. Examples of such scenarios include recommendation engines, fraud detection systems, and real-time anomaly detection in IoT devices.