Back to the Index

Feature Selection

What is feature selection?

‍Feature selection is the process of finding existing features, in potentially different feature groups, and joining them together along with the label(s) to define the set of features that will be used for training and inference for your model. Feature selection is performed with respect to a prediction problem that you want your model to solve. The selected features should not be redundant, should also not be prohibited for use, and should be feasible to compute in training and serving.

When do I perform feature selection?

When you want to train a model to solve a prediction problem, you select the features that you think you need for your model and include them in a feature view. With the feature view, you can create a training dataset for training your model, and feature vectors for online models, and a batch of inference data for offline models. Over time, as your production model degrades in performance, you may need to change the set of selected features to train an improved version of your model. This involves creating a new version of your feature view and connecting it to your inference pipelines for use with the new model.

Example of feature selection

In a customer churn prediction model, the feature selection process may involve evaluating various features such as customer age, account balance, and transaction history to determine which ones have the greatest predictive power for the model. The selected features would then be used in the training and inference pipelines for the model.

Here's an example of feature selection for churn prediction using a feature view in Hopsworks, that, in turn, is used to create training data for our churn prediction problem:

 # select features from different feature groups
selected_features = customer_fg.select_all() \
    .join(transactions_fg.select_except(["cust_id"]))

fv = fs.create_feature_view(
    name='customer_churn',
    query=selected_features,
    label=['churn_status']
)

# create training data using the selected features in the feature view
X_train, X_test, y_train, y_test = fv.train_test_split(test_size=0.2)

Interested for more?

🤖 Register for free on Hopsworks Serverless
🐍 Learn all about the Python-Centric Feature Store
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

Does this content look outdated? If you are interested in helping us maintain this, feel free to contact us.

F

Auto-regressive Models

F

Backfill features

Backfill training data

Backpressure for feature stores

Batch Inference Pipeline

F

CI/CD for MLOps

Compound AI Systems

Context Window for LLMs

F

DAG Processing Model

Data Compatibility

Data Partitioning

Data Transformation

Data Type (for features)

Data Validation (for features)

Data-Centric ML

Dimensional Modeling and Feature Stores

F

Encoding (for Features)

F

Gradient Accumulation

Grouped Query Attention

F

Hallucinations in LLMs

Hyperparameter Tuning

F

Idempotent Machine Learning Pipelines

In Context Learning (ICL)

Inference Pipeline

Instruction Datasets for Fine-Tuning LLMs

F

LLM Code Interpreter

LLM Temperature

LLMs - Large Language Models

Lagged features

F

Natural Language Processing (NLP)

F

On-Demand Features

On-Demand Transformation

Online Inference Pipeline

Online-Offline Feature Skew

Online-Offline Feature Store Consistency

F

Parameter-Efficient Fine-Tuning (PEFT) of LLMs

Point-in-Time Correct Joins

Precomputed Features

Prompt Engineering

F

RLHF - Reinforcement Learning from Human Feedback

Real-Time Machine Learning

Recommender System

Representation Learning

Retrieval Augmented Generation (RAG) for LLMs

F

SQL UDF in Python

Similarity Search

Splitting Training Data

Streaming Feature Pipeline

Streaming Inference Pipeline

F

Theory-of-Mind Tasks

Time travel (for features)

Train (Training) Set

Training Pipeline

Training-Inference Skew

Two-Tower Embedding Model

Types of Machine Learning

F

F

Vector Database

Versioning (of ML Artifacts)