There is a wide array of tools available to simplify the process for data scientists to package their models and deploy them in production, ranging from serverless functions to Docker containers. However, deploying models in production remains a challenge, particularly when it comes to data access.
Real-time ML systems typically require low-latency access to precomputed features containing history or context data. The code used to create those features should be consistent with the code used to create features using during model training. Similarly, batch ML systems should use the same logic to compute features for training and batch inference.
The FTI (Feature, Training, Inference) pipeline architecture is a unified pattern for building batch and real-time ML systems. It enables the independent development and operation of feature pipelines (that transform raw data into features/labels), training pipelines (that take features/labels as input and produce models as output), and inference pipelines (that take model(s) and features as input and produce predictions as output). The pipelines have clear inputs and outputs, and can even be implemented using different technologies (e.g., Spark for feature pipelines, and Python for training and inference pipelines).
In this workshop, we will will build a ML system to predict air quality that consists of 3 different programs - one program for each of the FTI pipelines. This system will be built using only Python. To run the three pipelines in production and manage the ML artifacts (features and models), we will use the free serverless tools Modal and Hopsworks, respectively.