Scheduled upgrade on April 4, 08:00 UTC
Kindly note that during the maintenance window, app.hopsworks.ai will not be accessible.
April 4, 2025
10
App Status
arrow back
Back to Blog
Lex Avstreikh
link to linkedin
Head of Strategy
Article updated on

Hopsworks AI Lakehouse Now Supports NVIDIA NIM Microservices

Run thousands of LLMs with enterprise-grade reliability on your own infrastructure.
June 11, 2025
4 min
Read
Lex Avstreikh
Lex Avstreikhlink to linkedin
Head of Strategy
Hopsworks

TL;DR

Hopsworks now supports NVIDIA NIM microservices, letting you deploy thousands of LLMs—including Mistral, Llama, and more—on your own infrastructure in minutes. The integration delivers enterprise-grade reliability, automatic optimization by NVIDIA, and real-time feature integration via Hopsworks for production-ready AI.

We at Hopsworks are pleased to announce our integration with NVIDIA's newly released NVIDIA NIM capability, unveiled at GTC Paris. This advancement enables deployment of a broad range of LLMs from Hugging Face for enterprise-ready inference on NVIDIA accelerated infrastructure. One container, thousands of models, optimized automatically.

We've embedded this capability into the Hopsworks AI Lakehouse to address a critical challenge: making LLMs genuinely useful with real data, proper governance, and cost-effective infrastructure utilization.

What This Means 

European enterprises pursuing AI sovereignty need practical solutions. The integration of Hopsworks with NIM microservices delivers:

  • Your models, your infrastructure: Deploy Mistral, Llama, or specialized European language models your team has fine-tuned. On-premises, in your cloud, or hybrid deployments.
  • Production-ready from day one: NVIDIA handles optimization automatically, selecting between TensorRT-LLM, vLLM, or SGLang based on model requirements.
  • Real data integration: Your LLM connects directly to your feature store, real-time data pipelines, and knowledge graphs through the Hopsworks platform.

The Architecture

Reference Arch - Nvidia Hopsworks GTC Paris
Image 1. Reference Arch - Nvidia Hopsworks GTC Paris

In the Hopsworks FTI architecture (feature, training, inference pipelines), the LLM (via a single NIM microservice container) operates within the inference pipeline, powered by:

  • Feature Store: Real-time and historical context including user behavior, recent transactions, and domain-specific data,
  • Vector Index: High-performance semantic search and RAG capabilities,
  • Knowledge Graph: Structured relationships for enhanced reasoning.

The NIM microservice enables model switching without pipeline reconstruction. Change models via configuration, run A/B tests seamlessly.

Function Calling: Enhanced Capabilities

Function calling transforms LLM capabilities by enabling:

  • Real-time feature store queries for user context
  • Structured data retrieval from data warehouses
  • Business logic execution through existing APIs

We've observed 10x accuracy improvements when LLMs access structured context compared to pure document retrieval.

Why This Works

  1. NVIDIA provides: Optimized inference, hardware acceleration, enterprise support for high-performance model serving.
  2. Hopsworks provides: Data pipelines, feature engineering, monitoring, governance - the infrastructure that transforms models into products.
  3. You maintain: Complete control over data, use cases, and compliance requirements.

Getting Started

# Deploy any Hugging Face model with NIM
docker run --gpus all \
  -p 8000:8000 \
  -e HF_TOKEN=<your_token> \
  -e NIM_MODEL_NAME=hf://mistralai/Mistral-7B-v0.1 \
  nvcr.io/nim/llm-nim:latest  

# Connect to Hopsworks
import hopsworks

project = hopsworks.login()
fs = project.get_feature_store()

# Create a serving endpoint combining NIM with Hopsworks
@udf(return_type="string")
def nim_with_context(customer_id: str, query: str) -> str:
    # Pull real-time features from Hopsworks
    customer_fv = fs.get_feature_view("customer_360", version=1)
    features = customer_fv.get_feature_vector(
        {"customer_id": customer_id},
        return_type="dict"
    )
    
    # Retrieve relevant documents from vector index
    vector_index = fs.get_feature_view("document_embeddings", version=1)
    relevant_docs = vector_index.find_neighbors(
        embedding=query,
        k=5
    )
    
    # Enhanced prompt with real-time context
    prompt = f"""
    Customer Context: Age: {features['age']}, 
    Recent Activity: {features['recent_purchases']}
    Relevant Information: {relevant_docs}
    
    Query: {query}
    """
    
    # Call NIM endpoint
    response = requests.post(
        "http://localhost:8000/v1/completions",
        json={"prompt": prompt, "max_tokens": 200}
    )
    
    return response.json()["choices"][0]["text"]

# Deploy as Hopsworks Model Serving
model = project.get_model_registry().python.create_model(
    name="llm-with-context",
    description="NIM + Hopsworks Context"
)

deployment = model.deploy(
    name="nim-contextual-llm",
    serving_tool="PYTHON",
    script_file="nim_serving.py"
)

# Your model now seamlessly combines:
# - NVIDIA optimized inference
# - Real-time feature access
# - Vector similarity search
# - Full observability
print(f"✅ Contextual LLM endpoint: {deployment.get_url()}")

Five minutes to deploy, another ten to connect your data.

Bottom Line

We're excited to bring this integration to our customers. Sovereign AI requires maintaining control, while leveraging best-in-class tools. NIM microservices provide access to a broad range of open models. Hopsworks makes those models production-ready with your data.

The combination delivers what enterprises need: flexibility, control, and reliability.

Learn more

References