This tutorial will produce three different Python programs that, when plugged together make a production ML system. First, we will understand the data sources: public, crowd-sourced air quality measurements that can be retrieved with an API key or scraped from a web page, and weather predictions/observations that can retrieved with free API services. The prediction problem is to predict air quality at the location of existing air quality sensors, using weather forecast data as the primary features for predicting air quality. We will show you how to write a Python program as a feature pipeline that can both scrape new data and provide historical data (air quality observations and weather forecasts). We will show you how to schedule this feature pipeline to run daily using Modal (you could also use Github Actions or any one of the many free Python orchestration services available today).
Our feature pipeline will store our features in a free serverless feature store (Hopsworks) and then we will write a training pipeline that reads features and air quality observations (labels) to train a model to predict air quality given a weather forecast (a set of weather features).
Finally, we will develop a UI using Hugging Face Spaces that will include a batch inference program to retrieve the latest weather forecast features and the model and to predict weather quality.
We will show you how to log predictions, so that you can build a continually improving ML system that provides hindcasts with insights into its historical performance.
For this tutorial, you will need experience with programming in Python, a laptop and Internet access. More info here.