Training-Inference Skew
What is training-inference skew?
Model-dependent transformations are applied in both the training and inference pipelines. Training-inference skew is when there are (even slightly) different implementations of a transformation between the training and inference pipelines. Training-inference skew can silently and negatively affect model performance and is a hard bug to detect.
Why is it important to watch for training-inference skew?
Training-inference skew is a discrepancy that arises when the data preprocessing or feature transformation steps differ between the training and inference pipelines. Such inconsistencies can lead to degraded model performance and hard-to-detect issues in real-world applications. It is crucial to watch for training-inference skew for several reasons:
- Model performance: Discrepancies between training and inference pipelines can result in the model performing poorly when deployed, even if it performed well during training and validation.
- Debugging and troubleshooting: Training-inference skew can be challenging to identify and diagnose, as the issues often stem from subtle differences in the implementation of data preprocessing or feature transformations.
- Reproducibility: Ensuring that the same data preprocessing and feature transformation steps are used in both pipelines is essential for achieving reproducible results.