Hopsworks now supports NVIDIA NIM microservices, letting you deploy thousands of LLMs—including Mistral, Llama, and more—on your own infrastructure in minutes. The integration delivers enterprise-grade reliability, automatic optimization by NVIDIA, and real-time feature integration via Hopsworks for production-ready AI.
We at Hopsworks are pleased to announce our integration with NVIDIA's newly released NVIDIA NIM capability, unveiled at GTC Paris. This advancement enables deployment of a broad range of LLMs from Hugging Face for enterprise-ready inference on NVIDIA accelerated infrastructure. One container, thousands of models, optimized automatically.
We've embedded this capability into the Hopsworks AI Lakehouse to address a critical challenge: making LLMs genuinely useful with real data, proper governance, and cost-effective infrastructure utilization.
What This Means
European enterprises pursuing AI sovereignty need practical solutions. The integration of Hopsworks with NIM microservices delivers:
Your models, your infrastructure: Deploy Mistral, Llama, or specialized European language models your team has fine-tuned. On-premises, in your cloud, or hybrid deployments.
Production-ready from day one: NVIDIA handles optimization automatically, selecting between TensorRT-LLM, vLLM, or SGLang based on model requirements.
Real data integration: Your LLM connects directly to your feature store, real-time data pipelines, and knowledge graphs through the Hopsworks platform.
The Architecture
Image 1. Reference Arch - Nvidia Hopsworks GTC Paris
Feature Store: Real-time and historical context including user behavior, recent transactions, and domain-specific data,
Vector Index: High-performance semantic search and RAG capabilities,
Knowledge Graph: Structured relationships for enhanced reasoning.
The NIM microservice enables model switching without pipeline reconstruction. Change models via configuration, run A/B tests seamlessly.
Function Calling: Enhanced Capabilities
Function calling transforms LLM capabilities by enabling:
Real-time feature store queries for user context
Structured data retrieval from data warehouses
Business logic execution through existing APIs
We've observed 10x accuracy improvements when LLMs access structured context compared to pure document retrieval.
Why This Works
NVIDIA provides: Optimized inference, hardware acceleration, enterprise support for high-performance model serving.
Hopsworks provides: Data pipelines, feature engineering, monitoring, governance - the infrastructure that transforms models into products.
You maintain: Complete control over data, use cases, and compliance requirements.
Getting Started
# Deploy any Hugging Face model with NIM
docker run --gpus all \
-p 8000:8000 \
-e HF_TOKEN=<your_token> \
-e NIM_MODEL_NAME=hf://mistralai/Mistral-7B-v0.1 \
nvcr.io/nim/llm-nim:latest
# Connect to Hopsworks
import hopsworks
project = hopsworks.login()
fs = project.get_feature_store()
# Create a serving endpoint combining NIM with Hopsworks
@udf(return_type="string")
def nim_with_context(customer_id: str, query: str) -> str:
# Pull real-time features from Hopsworks
customer_fv = fs.get_feature_view("customer_360", version=1)
features = customer_fv.get_feature_vector(
{"customer_id": customer_id},
return_type="dict"
)
# Retrieve relevant documents from vector index
vector_index = fs.get_feature_view("document_embeddings", version=1)
relevant_docs = vector_index.find_neighbors(
embedding=query,
k=5
)
# Enhanced prompt with real-time context
prompt = f"""
Customer Context: Age: {features['age']},
Recent Activity: {features['recent_purchases']}
Relevant Information: {relevant_docs}
Query: {query}
"""
# Call NIM endpoint
response = requests.post(
"http://localhost:8000/v1/completions",
json={"prompt": prompt, "max_tokens": 200}
)
return response.json()["choices"][0]["text"]
# Deploy as Hopsworks Model Serving
model = project.get_model_registry().python.create_model(
name="llm-with-context",
description="NIM + Hopsworks Context"
)
deployment = model.deploy(
name="nim-contextual-llm",
serving_tool="PYTHON",
script_file="nim_serving.py"
)
# Your model now seamlessly combines:
# - NVIDIA optimized inference
# - Real-time feature access
# - Vector similarity search
# - Full observability
print(f"✅ Contextual LLM endpoint: {deployment.get_url()}")
Five minutes to deploy, another ten to connect your data.
Bottom Line
We're excited to bring this integration to our customers. Sovereign AI requires maintaining control, while leveraging best-in-class tools. NIM microservices provide access to a broad range of open models. Hopsworks makes those models production-ready with your data.
The combination delivers what enterprises need: flexibility, control, and reliability.