We are proud to introduce the AI Lakehouse, the first unified tool specifically designed for building AI systems.
The orchestration of ML pipelines is crucial to making ML pipelines run without human intervention, and run reliably, even in the event of hardware or software errors.
Orchestration of ML pipelines refers to the process of automating the execution of the feature/training/inference pipeline. Ideally, a single orchestrator tool should manage the execution of the different ML pipelines.
The different ML pipelines have different orchestration requirements. Batch feature and inference pipelines need to be orchestrated and training pipelines can also be orchestrated:
Online inference and streaming ML pipelines do not need to be orchestrated:
A good orchestrator will:
Examples of tools and frameworks used for ML pipeline orchestration include Apache Airflow, Flyte, Kubeflow, Azure Data Factory, and AWS Step Functions. These tools provide a range of features, such as workflow scheduling, monitoring, fault tolerance, and version control, that can help orchestrate ML pipelines. Simpler cron-based scheduling is available from platforms like Github Actions and Modal that are useful when prototyping.