Skew - MLOps Dictionary

What is skew in machine learning?

In machine learning, skew refers to an imbalance in the distribution of the label (target variable) in a training dataset. A training dataset is said to be skewed if the distribution of its target variable is asymmetric around its mean value - that is, it is not balanced and some values are more highly represented than other values. For example, if we have a dataset of credit card transactions, and only a small fraction of the transactions are fraudulent - the training data is skewed towards non-fraudulent credit card transactions.

In machine learning, skew can affect the accuracy of predictive models, as models trained on imbalanced data may have difficulty accurately predicting minority classes or values. In such cases, techniques such as oversampling or undersampling can be used to balance the data distribution and improve model performance.

Interested for More

Get Started

Code examples & tutorials

Production-ready starter projects for fraud detection, churn prediction, and real-time recommendations.

Free Tier

Try Hopsworks SaaS

Explore the AI Lakehouse with a free tier. No credit card required. Get started in minutes.