Scheduled upgrade on April 4, 08:00 UTC

Kindly note that during the maintenance window, app.hopsworks.ai will not be accessible.

April 4, 2025

App Status

Back to Blog

Gibson Chikafa

Software Engineer

Let's keep in touch!

Subscribe to our newsletter and receive the latest product updates, upcoming events, and industry news.

More Blogs

Hopsworks AI Lakehouse Now Supports NVIDIA NIM Microservices

How we secure your data with Hopsworks

Migrating from AWS to a European Cloud - How We Cut Costs by 62%

The 10 Fallacies of MLOps

Hopsworks AI Lakehouse: The Power of Integrated MLOps Components

Article updated on

How to Build a Python Environment with Custom Docker Commands

Track Python Environment History in Hopsworks

October 26, 2023

8 min

Read

Gibson Chikafa

Software Engineer

Hopsworks

Data Science

TL;DR

Hopsworks comes with a prepackaged Python environment that contains libraries for data engineering, machine learning, and more general data science development. Hopsworks also offers the ability to install additional packages using different options e.g., Pypi, Conda channel, and public or private git repository among others. In some cases, the libraries require installing Linux/OS-level packages. It is also imperative to track how the environment has been evolving over time.

Introduction

In Hopsworks 3.4 we have introduced new capabilities to assist in managing the Python environment:

Running custom bash commands that can be used to install Linux/OS-level packages and add more complex configurations for your environment e.g, configuring an oracle database.
Show the history of the python environment i.e., which libraries were installed at each creation of the new environment.

The Hopsworks installation ships with a Miniconda environment that comes preinstalled with the most popular libraries you can find in a data scientist toolkit, including TensorFlow, PyTorch and scikit-learn. The environment is managed using the Hopsworks Python service to install libraries which may then be used in Jupyter notebooks or the Jobs service in the platform.

Some Python libraries require the installation of some OS-Level libraries. In some cases, you may need to add more complex configuration to your environment. This requires writing your own commands and executing them on top of the existing environment.

The Python environment is shared by different members of the project. When a member of the project introduces a change to the environment i.e., installs/uninstalls a library, a new environment is created and it becomes the de facto environment for everyone in the project. It is therefore important to track how the environment has been changing over time i.e., what libraries were installed, uninstalled, upgraded, or downgraded when the environment was created and who introduced the changes.

In this blog post, we will describe how you can run custom commands to install OS-Level packages or add extra configuration to the Python environment in Hopsworks. Furthermore, we will show how you can track the changes of your Python environment.

Prerequisite

To follow this tutorial you should have an instance of Hopsworks version 3.4 or above.

Running custom commands

In this section, we will see how you can run custom bash commands in Hopsworks to configure your Python environment.

In Hopsworks, we maintain a docker image built on top of Ubuntu Linux distribution. You can run generic bash commands on top of the project environment from the UI or REST API.

Setting up the bash script and artifacts from the UI

To use the UI, navigate to the Python environment in the Project settings. In the Python environment page, navigate to custom commands. From the UI, you can write the bash commands in the textbox provided. These bash commands will be uploaded and executed when building your new environment. You can include build artifacts e.g., binaries that you would like to execute or include when building the environment.

***Figure 1:*** *Provide script and artifacts for custom commands.*

Setting up the bash script and artifacts from the REST API

From the REST API, you should provide the path, in HopsFS, to the bash script and the artifacts. Thus, you should upload the artifacts to the Hopsworks filesystem - HopsFS. The REST API endpoint for running custom commands is: hopsworks-api/api/project/<projectId>/python/environments/<pythonVersion>/commands/custom and the POST request body should look like this:

{
"commandsFile": "",
"artifacts": ""
}

Example Bash Script

Now let’s see an example of how you can install a Linux package, install a Python package, and use artifacts that you included in the commands file that you provide.

The bash script below shows how you can install OS-Level packages, and use the artifacts included during the build.

#!/bin/bash
sudo apt-get install net-tools
ls /srv/hops/build/
tar -xvf /srv/hops/build/files.tgz /tmp
/srv/hops/anaconda/envs/theenv/bin/pip install spotify==0.10.2

Now let’s look at what each command in the script does.

The first line of your bash script should always be #!/bin/bash (known as shebang) so that the script can be interpreted and executed using the Bash shell.
We are installing the net-tools package. You can use apt, apt-get and deb commands to install packages. You should always run these commands with sudo. In some cases, these commands will ask for user input, therefore you should provide the input of what the command expects, e.g., sudo apt -y install, otherwise the build will fail. We have already configured apt-get to be non-interactive
The build artifacts will be copied to srv/hops/build. You can use them in your script via this path. This path is also available via the environmental variable BUILD_PATH. If you want to use many artifacts it is advisable to create a zip file and upload it to HopsFS in one of your project datasets. You can then include the zip file as one of the artifacts.
In your bash script, you can include a command to extract the zip file. If you have the files on a remote server you can use a download tool like wget in your bashscript.
The conda environment is located in /srv/hops/anaconda/envs/theenv. To install or uninstall packages within this conda environment, you can follow the example step. If the command requires user input, please include the command along with the expected input to prevent build failures.

Python Environment

The Python environment evolves over time as libraries are installed, uninstalled, upgraded, and downgraded. To help you keep track of these changes, you can now access the Python environment history via the UI. This feature allows you to review the specific changes made when each new environment iteration. Hopsworks retains a versioned YAML file for each environment, enabling you to revert to an earlier environment if necessary. To compare the changes between environments, simply click the button shown in figure 2. This will display the differences between the current environment and the previous one from which it was derived.

***Figure 2:*** *View difference between environments*

As we can see in Figure 3, you can review custom commands for the environment in the UI, if the environment was built using custom commands.
‍

***Figure 3:*** *Review custom commands details from history.*

Summary

In this article, we have shown how you can write and execute custom commands to add more sophisticated configurations to your Python environment. We have also shown how you can track the Python changes made to your environment in the UI.

References

Interested for more?

🤖 Register for free on Hopsworks Serverless
🌐 Read about the open, disaggregated AI Lakehouse stack
📚 Get your early copy: O'Reilly's 'Building Machine Learning Systems' book
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

More blogs

Beyond Self-Driving Cars

This blog introduces the feature store as a new element in automotive machine learning (ML) systems and as a new data science tool and process for building and deploying better Machine learning models

Remco Frijling

What is MLOps?

This blog explores MLOps principles, with a focus on versioning, and provides a practical example using Hopsworks for both data and model versioning.

Haziqa Sajid

ML and AI applications are becoming increasingly demanding in terms of performance, we compare Redis to RonDB in terms of Scalability, Throughput and H.A.

AI/ML needs a Key-Value store, and Redis is not up to it

Seeing how Redis is a popular open-source feature store with features significantly similar to RonDB, we compared the innards of RonDB’s multithreading architecture to the commercial Redis products.

Mikael Ronström