Scheduled upgrade on April 4, 08:00 UTC

Kindly note that during the maintenance window, app.hopsworks.ai will not be accessible.

April 4, 2025

App Status

Back to Blog

Lex Avstreikh

Head of Strategy

Let's keep in touch!

Subscribe to our newsletter and receive the latest product updates, upcoming events, and industry news.

More Blogs

How we secure your data with Hopsworks

Migrating from AWS to a European Cloud - How We Cut Costs by 62%

The 10 Fallacies of MLOps

Hopsworks AI Lakehouse: The Power of Integrated MLOps Components

Unlocking the Power of AI in Government

Article updated on

Hopsworks AI Lakehouse Now Supports NVIDIA NIM Microservices

Run thousands of LLMs with enterprise-grade reliability on your own infrastructure.

June 11, 2025

4 min

Read

Lex Avstreikh

Head of Strategy

Hopsworks

Feature Store

Data Engineering

TL;DR

Hopsworks now supports NVIDIA NIM microservices, letting you deploy thousands of LLMs—including Mistral, Llama, and more—on your own infrastructure in minutes. The integration delivers enterprise-grade reliability, automatic optimization by NVIDIA, and real-time feature integration via Hopsworks for production-ready AI.

We at Hopsworks are pleased to announce our integration with NVIDIA's newly released NVIDIA NIM capability, unveiled at GTC Paris. This advancement enables deployment of a broad range of LLMs from Hugging Face for enterprise-ready inference on NVIDIA accelerated infrastructure. One container, thousands of models, optimized automatically.

We've embedded this capability into the Hopsworks AI Lakehouse to address a critical challenge: making LLMs genuinely useful with real data, proper governance, and cost-effective infrastructure utilization.

What This Means

European enterprises pursuing AI sovereignty need practical solutions. The integration of Hopsworks with NIM microservices delivers:

Your models, your infrastructure: Deploy Mistral, Llama, or specialized European language models your team has fine-tuned. On-premises, in your cloud, or hybrid deployments.
Production-ready from day one: NVIDIA handles optimization automatically, selecting between TensorRT-LLM, vLLM, or SGLang based on model requirements.
Real data integration: Your LLM connects directly to your feature store, real-time data pipelines, and knowledge graphs through the Hopsworks platform.

The Architecture

***Image 1.*** Reference Arch - Nvidia Hopsworks GTC Paris

In the Hopsworks FTI architecture (feature, training, inference pipelines), the LLM (via a single NIM microservice container) operates within the inference pipeline, powered by:

Feature Store: Real-time and historical context including user behavior, recent transactions, and domain-specific data,
Vector Index: High-performance semantic search and RAG capabilities,
Knowledge Graph: Structured relationships for enhanced reasoning.

The NIM microservice enables model switching without pipeline reconstruction. Change models via configuration, run A/B tests seamlessly.

Function Calling: Enhanced Capabilities

Function calling transforms LLM capabilities by enabling:

Real-time feature store queries for user context
Structured data retrieval from data warehouses
Business logic execution through existing APIs

We've observed 10x accuracy improvements when LLMs access structured context compared to pure document retrieval.

Why This Works

NVIDIA provides: Optimized inference, hardware acceleration, enterprise support for high-performance model serving.
Hopsworks provides: Data pipelines, feature engineering, monitoring, governance - the infrastructure that transforms models into products.
You maintain: Complete control over data, use cases, and compliance requirements.

Getting Started

# Deploy any Hugging Face model with NIM
docker run --gpus all \
  -p 8000:8000 \
  -e HF_TOKEN=<your_token> \
  -e NIM_MODEL_NAME=hf://mistralai/Mistral-7B-v0.1 \
  nvcr.io/nim/llm-nim:latest  

# Connect to Hopsworks
import hopsworks

project = hopsworks.login()
fs = project.get_feature_store()

# Create a serving endpoint combining NIM with Hopsworks
@udf(return_type="string")
def nim_with_context(customer_id: str, query: str) -> str:
    # Pull real-time features from Hopsworks
    customer_fv = fs.get_feature_view("customer_360", version=1)
    features = customer_fv.get_feature_vector(
        {"customer_id": customer_id},
        return_type="dict"
    )
    
    # Retrieve relevant documents from vector index
    vector_index = fs.get_feature_view("document_embeddings", version=1)
    relevant_docs = vector_index.find_neighbors(
        embedding=query,
        k=5
    )
    
    # Enhanced prompt with real-time context
    prompt = f"""
    Customer Context: Age: {features['age']}, 
    Recent Activity: {features['recent_purchases']}
    Relevant Information: {relevant_docs}
    
    Query: {query}
    """
    
    # Call NIM endpoint
    response = requests.post(
        "http://localhost:8000/v1/completions",
        json={"prompt": prompt, "max_tokens": 200}
    )
    
    return response.json()["choices"][0]["text"]

# Deploy as Hopsworks Model Serving
model = project.get_model_registry().python.create_model(
    name="llm-with-context",
    description="NIM + Hopsworks Context"
)

deployment = model.deploy(
    name="nim-contextual-llm",
    serving_tool="PYTHON",
    script_file="nim_serving.py"
)

# Your model now seamlessly combines:
# - NVIDIA optimized inference
# - Real-time feature access
# - Vector similarity search
# - Full observability
print(f"✅ Contextual LLM endpoint: {deployment.get_url()}")

Five minutes to deploy, another ten to connect your data.

Bottom Line

We're excited to bring this integration to our customers. Sovereign AI requires maintaining control, while leveraging best-in-class tools. NIM microservices provide access to a broad range of open models. Hopsworks makes those models production-ready with your data.

The combination delivers what enterprises need: flexibility, control, and reliability.

Learn more

Try Hopsworks AI Lakehouse: app.hopsworks.ai
NVIDIA NIM microservice for deploying a broad range of LLMs: build.nvidia.com
Technical Documentation: docs.hopsworks.ai

References

Interested for more?

🤖 Register for free on Hopsworks Serverless
🌐 Read about the open, disaggregated AI Lakehouse stack
📚 Get your early copy: O'Reilly's 'Building Machine Learning Systems' book
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

More blogs

GenAI comes to Hopsworks with Vector Similarity Search

Hopsworks has added support for approximate nearest neighbor (ANN) indexing and vector similarity search for vector embeddings stored in its feature store.

Kenneth Mak

The data transformation taxonomy is important to understand for any AI application that wants to reuse feature data in more than one model.

The Taxonomy for Data Transformations in AI Systems

This article introduces a taxonomy for data transformations in AI applications that is fundamental for any AI system that wants to reuse feature data in more than one model.

Manu Joseph

Feature Store: The missing data layer for Machine Learning pipelines?

In this blog, we discuss the state-of-the-art in data management and machine learning pipelines (within the wider field of MLOps) and present the first open-source feature store, Hopsworks.

Jim Dowling