No items found.

Hopsworks 4.0 - The AI Lakehouse

We are proud to introduce the AI Lakehouse, the first unified tool specifically designed for building AI systems.

Read more Try it now!

Carolin Svenberg

Marketing Communications

Lex Avstreikh

Head of Strategy

Let's keep in touch!

Subscribe to our newsletter and receive the latest product updates, upcoming events, and industry news.

More Blogs

RonDB: A Real-Time Database for Real-Time AI Systems

From Lakehouse to AI Lakehouse with a Python-Native Query Engine

The Feature Store Makes Your Data Warehouse Easy to Use for AI

The Journey from Star Schema to Snowflake Schema in the Feature Store

The Taxonomy for Data Transformations in AI Systems

Article updated on

5 Machine Learning Myths Debunked

Common Misconceptions about MLOps, LLMs and Machine Learning in production

February 6, 2024

9 min

Read

Carolin Svenberg

Carolin Svenberg

Marketing Communications

Hopsworks

Lex Avstreikh

Head of Strategy

Hopsworks

TL;DR

As AI and machine learning (ML) are in a state of constant change and development, new frameworks, technology and implementations continuously appear. To navigate the influx of information can be challenging, making it easy to misunderstand or confuse a particular machine learning concept. In this article we attempt to highlight and break down some common misconceptions about MLOps and machine learning in production.

ML Myth 1: MLOps is all about tooling and automation

False! While tools and automation are essential components of MLOps, similarly to DevOps it should be thought of as a set of principles that contribute to building Machine Learning Systems and keeping it operational. Typically this would include: automated versioning, automated monitoring and automated testing.

MLOps should be considered as a set of principles that involves people, processes and technologies (which include frameworks). It encompasses the entire machine learning lifecycle and includes collaboration among data scientists, engineers, and operations teams. However, one aspect where many go wrong is thinking of MLOps as a waterfall software architecture.

The 2023 MAD Landscape — Figure 1: The 2023 MAD (Machine learning, Artificial intelligence & Data) Landscape

The MAD (Machine learning, Artificial intelligence & Data) landscape (pictured above) covers the vast majority of tools, software and technologies throughout the different lifecycle stages of MLOps.

In our blog “From MLOps to ML Systems” we provide a different approach for building machine learning (ML) Systems with the FTI (Feature, Training & Inference) pipelines. With the FTI architecture, there is no single ML pipeline but three independently developed and operated pipelines: feature, training and inference pipelines. This architecture describes both batch and real-time ML systems, making it easier for developers to move to and from batch and real-time systems. Ultimately, the FTI approach for ML systems enables easier collaboration between data and ML teams.

FTI pipelines with Hopsworks feature store and model registry — Figure 2: A unified architecture for ML systems consisting of three individual pipelines; a feature pipeline, training pipeline and an inference pipeline.

Watch our video to learn more about ML systems:

‍ML Myth 2: It takes months to get your machine learning models in production

True… with a caveat! While bringing models to a production environment might sometimes take months, with an iterative approach (thinking agile and MVPs), ML models can be deployed in production in a matter of weeks and continuously improved.

While model development is a crucial phase, it is one part of the entire ML lifecycle and deploying a machine learning model in production can be time-consuming or may sometimes end up never happening at all. Most companies take more than a month to first train and deploy a machine learning model in production. This largely is due to the fact that many organizations lack the systems needed to organize the pipelines that connect models to the data sources (the features!).

In order to get your models to production you need to feed data and connect your feature, training and inference pipelines to a unified infrastructure. A way of demonstrating how fast machine learning models can be put in production is through creating a MVPs (Minimal Viable Production Service). A MVPs is the simplest end-to-end ML system that includes the three main ML pipelines: a feature pipeline, a training pipeline, and an inference pipeline, and uses a feature store and model registry. The idea behind the MVPs is to quickly demonstrate how the model will be used in the production environment to AI-enable an existing or new predictive service, as well as provide a measurement for improvement and how the model performance is related to business KPIs.

Structure of Minimal Viable Prediction service (MVPs) — Figure 3: Structure for building a MVPs

ML Myth 3: Once the model is in production, the work is done

False! Continuous model and feature monitoring is required to ensure model performance and improvement over time.

Continuous monitoring, feedback loops, and model retraining are essential for maintaining model performance over time. Practitioners should recognize the need for continuous integration and continuous deployment (CI/CD) pipelines to facilitate seamless updates and improvements to ML systems.

ML models are not perfect, their performance can degrade over time due to changes in data distributions or other external factors. Continuous monitoring can be an essential strategy to detect issues such as data quality problems or changes in the environment that can impact model accuracy. Additionally; teams should strive to improve the current accuracy of their models in production; strategies to log feedback and re-train the model should be implemented. A machine learning system with models in operation should be seen as an ever-evolving system; nothing is static.

If you are curious to learn more about feature monitoring, we did a webinar on how to monitor features with a feature store:

ML Myth 4: Feature Stores are only beneficial for big Data Science teams with tons of Data

The reality: Despite being particularly beneficial for larger Data Science teams, a feature store is a valuable part in the MLOps practices for organizations of all sizes and with different data volumes.

Even small to medium-sized businesses as well as ML freelancers with smaller amounts of data will benefit from implementing a feature store. By using a feature store, data science teams can streamline their machine learning in production workflows, improve collaboration, structure data and ensure the reliability of ML systems. Ultimately, an ML system with a feature store can be adapted to fit the scale, requirements and data amount of different organizations.

Another common misconception is that MLOps is expensive and the hassle of installing different software makes the implementation into your infrastructure difficult. The fact is that there are a bunch of open source and free ML platforms out there that can be implemented into your ML system’s infrastructure seamlessly and with little or no cost. With this in mind, even small ML teams or individual users can create streamlined, serverless workflows for machine learning models in production.

If you need inspiration on where to begin, there is a great community out there for people who are passionate about building serverless ML systems as well as a free course that teaches you all you need to know about how to build one of these systems.

ML Myth 5: Real-time data and feature stores are not necessary for LLMs

Well, it depends! Real-time data may not be needed for all applications and use of large language models but it’s an essential component for personalized LLMs.

If your application requires time-dependent information on your user, such as latest clicks or actions to provide context for the LLM recommendation; your ML system requires access to fresh data, and fast. This is where vector databases need to be complemented with a robust feature store layer supporting embeddings and fast retrieval.

Watch our webinar to get a deep dive into LLMs:

References

Interested for more?

🤖 Register for free on Hopsworks Serverless
📚 Get your early copy: O'Reilly's 'Building Machine Learning Systems' book
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

More blogs

Explore Job Scheduling and Orchestration in Hopsworks including how simple jobs can be scheduled through the Hopsworks UI by non-technical users.

Data Engineering

Job Scheduling & Orchestration using Hopsworks and Airflow

This article covers the different aspects of Job Scheduling in Hopsworks including how simple jobs can be scheduled through the Hopsworks UI by non-technical users

Ehsan Heydari

Read about how Hopsworks brings support for scale-out AI with the ExtremeEarth project focusing on issues such as food security and sea mapping.

Data Engineering

AI Software Architecture for Copernicus Data with Hopsworks

Hopsworks brings support for scale-out AI with the ExtremeEarth project which focuses on the most concerning issues of food security and sea mapping.

Theofilos Kakantousis

Learn how to design and ingest features, browse existing features, create training datasets as DataFrames or as files on Azure Blob storage.

February 25, 2021

How to engineer and use Features in Azure ML Studio with the Hopsworks Feature Store

Learn how to design and ingest features, browse existing features, create training datasets as DataFrames or as files on Azure Blob storage.

Moritz Meister

PRODUCT

RESOURCES

COMPANY

JOIN OUR MAILING LIST

Subscribe to our newsletter and receive the latest product updates, upcoming events, and industry news.

© Hopsworks 2024. All rights reserved. Various trademarks held by their respective owners.

Terms and Conditions