In April 2021 the European Commission presented the Artificial Intelligence (AI) Act, with the expectation that it will become law in 2023. The regulation lays out a set of rules that apply to any AI systems used, or serving outputs, within the European Union (EU). Data governance and oversight is at the heart of this legislation, with significant fines resulting for failure to comply. Competition is being won today with AI, and future proofing your AI strategy today will ensure you’re prepared for these regulations. This blog is intended to provide an introduction to the EU AI Act and explain why Feature Stores provide a great solution to the obligations imposed by the regulation.
The proposed EU AI Act is a significant development against a backdrop of increasing global policy governing the use and development of AI. Whilst the draft regulation will certainly change before it is implemented, the expectation is to have the Act published by the end of 2023. Looking at how the EU implemented GDPR we can be fairly certain that one thing which won’t change is the punitive fines of up to 6% of annual revenue for failing to comply.
The Act is set to have committees vote on the final text in October 2022 and will extend to any AI system used within the EU. This means that if you’re using AI to inform decisions which service customers in the EU you need to pay attention now. Our research shows that the majority of companies still have significant work to do to prepare for this regulation and address the risks accordingly. Less than 22% of the companies we surveyed had started to ensure that the relevant parts of their organisation are familiar with the draft and thinking actively about how they’ll put the appropriate levels of governance in place.
The time is now to start standardising processes supporting the governance of Data Powered Prediction Services, as the EU AI Act is going to make this non-optional. Given the cross functional teams required to get these successfully into Production, getting proper systems in place which ensure auditability and consistency will pay huge dividends.
The EU draft regulation’s broad definition of AI aims to be as technology neutral and future proof as possible, taking into account the fast developments related to AI. AI applications will be classified around a set of three risk categories with a layered approach to enforcement. Full details of the EU’s current thinking on how they plan to distinguish between the three groups is still in flight, and as of April 2022 the co-rapporteurs working on the text were still negotiating key amendments. As a starting point for undertaking an impact assessment it seems sensible to use this framework for developing your own internal risk-classification.
A lighter set of regulations will be applied to AI falling into the “Minimal Risk” category, and those with unacceptable risk will be banned. Applications which fall into the HRAIS category will require third party independent certification through a mandatory CE-marking procedure. Organisations like the Research Institute of Sweden (RISE) are on hand to help customers navigate the complexities of these new standards.
The rules for HRAIS place an emphasis on ensuring “appropriate data governance and management practices” and that datasets being used for training the models are “relevant, representative and free of errors”. Companies must be able to demonstrate that their systems comply with the rules; What are the data sources? How have the datasets been created? What are the accompanying metrics that demonstrate how the model is performing? How do you enable human oversight to reduce risks? To be CE-certified RISE is emphasising the importance of having Explainable AI. Through appropriate tools be able to explain why a system has made a particular decision, and be able to reproduce what has happened to show what may have gone wrong.
Feature Stores, such as the Hopsworks Platform, provide a ready made solution to this conundrum. Designed from the ground up to address the thorny data challenges associated with productionising Machine Learning, they “facilitate the discovery, documentation, and reuse of features and to ensure their correctness, whether they’re used for batch or online applications”. So how exactly do they do that?
In order for Machine Learning models to make high quality predictions they require large volumes of data. That data often takes the form of Features, an individual measurable property which is the result of transforming raw data into something which can be used to feed a predictive model. Feature Stores are specialist datastores designed to store these Features, together with their associated metadata, principally for the purpose of sharing Features across different teams and breaking down data silos. Historical values of Features are stored by default so they can be used to create, and inform, training datasets and score batch applications within analytical models.
Feature Stores, such as the Hopsworks platform, are purpose built to enable cross functional teams across an organisation collaborate on production deployment of ML models. They deliver on this promise through a range of capabilities which simplify the building, testing and deployment of ML models through their entire life cycle. From the raw source data being used to create Features, their associated definition, through to how these are combined to train and serve models; the Feature Store contains a complete history of how things have changed over time. Capabilities such as time travel allow you to reconstruct training datasets used to train models in the past, replay events and understand how your AI models are changing over time.
Data Scientists, ML Engineers, Data Engineers and Business Analysts collaborate through sharing and easily searching for Features within the Feature Store, and either reuse these Features directly or create a new version of the existing Feature.
These native Feature Store capabilities provide a ready made answer to the complex data governance questions posed by the EU AI Act and dramatically simplify being able to reproduce what’s happened, or performing root cause analysis if things have gone wrong.
One of the biggest benefits a Feature Store brings to the productionising of ML is through the reduction of data pipeline complexity. Without a Feature Store teams have to build separate data pipelines for each ML model they want to deploy. As data sources can often cross multiple teams tracking which pipeline went wrong when becomes an organisational challenge.
In the diagram below you can see how complexity builds overtime, increasing the data governance challenges associated with the EU AI Act. As new models are added over time the complexity grows exponentially.
The Hopsworks Feature Store Platform removes this complexity by providing a single, centralised warehouse for Features. By providing a single place where model development, and refinement, can take place the Feature Store removes the complexity associated with understanding, and scaling of the data pipelines needed to power ML in production. As a by-product it provides a clearly documented repository of exactly what data was used to train a model, and how that has changed over time.
At Hopsworks we’re focused on delivering the world’s best Feature Store for a growing list of customers. Our Platform, available on any Cloud or on-premise, helps them productionize ML at scale by facilitating cross team collaboration and breaking down data silos. Seamless integration with Data Validation solutions like Great Expectations make it simple to address a wide range of data related challenges. As a result we provide the bridge between Enterprise Data and Enterprise AI.
As a European headquartered company we’re acutely aware of the importance of strong Data Governance and are closely following the development of the EU AI Act legislation to ensure continued development of our Platform so that it provides a great solution to the obligations it imposes.
Whilst the details of the EU AI Act is still some way being fully ratified one thing which is abundantly clear is the importance of data governance in order to be CE-certified. As the primary benefits of Feature Stores centre around Productionising ML at scale through collaboration and breaking down data silos they’re a great place to focus attention if you want to get started preparing your organisation to be ready.