“It is now expected by regulators that compliance programmes are not only effective but can be demonstrated through data to be effective.”
Tapan Debnath, Head of Integrity, Regulatory Affairs & Data Privacy at ABB
Private network access, encrypted customer, and data access control.
Legal framework and compliance for specific AML regulation and Data regulation (GDPR, CCPA, HIPAA) .
Roles, policies, standards, and processes embedded in both the organisation that uses the framework and the service(s) provider.
“Traditional technological approaches to combat [...] evolving threats are meeting with less success resulting in large numbers of “false positives.”
Radish Singh, AML Specialist at Deloitte Forensic
Existing approaches to identifying fraud and money laundering rely heavily on databases of human-engineered rules that attempt to match patterns that are indicative of fraud. As new fraud schemes are identified, new rules are added to the rule engines (Rules-Based Fraud Detection). For example, in money laundering, smurfing is a well known attack, where lots of private accounts aggregate money using small, under-the-radar transactions at hubs for later extraction.
Suspicious customer fulfils certain predefined limited criteria:
Transaction is flagged
Flagged items are added to a list
Investigators have to review items on the list.
In the Rules-Based Fraud Detection code example below, you can see the rule-based approach to identifying suspicious financial transactions. Here, you define a large set of rules that are applied to all financial transactions. If a financial transaction matches any of the rules, an alert is triggered. If the alert was incorrectly triggered (false positive), it induces costs. If no alert was triggered, but one should have been (false negative), you must design a new rule to identify the fraud scheme (if possible). Companies maintain these rule databases and routinely ship updates to customers.
The problem with Rules-Based Fraud Detection systems is the huge number of false-positive alerts that take time and money to run down. In addition, they are not capable of detecting changing threats, as the rules are not able to generalize to capture similar but slightly modified threats. More alarmingly, threats that involve patterns across many related transactions , such as smurfing, cannot be identified using existing rule-based systems.
To overcome those challenges, we have developed a new state-of-the-art solution for identifying suspicious activities based on semi-supervised deep learning and anomaly detection.
The key insight with anomaly detection with deep learning is that it can generalize from training data to identify anomalous patterns in transactions that are indicative of fraud. Deep learning loves large amounts of data, and the more examples of “normal” financial transactions you can train a model with, the more accurate it becomes. The result is an anomaly detection engine that makes it harder for money launderers to make small changes in how they launder the money to stay undetected.
The model is trained on historical financial transactions, including examples of fraud and non-fraud.
In real-time, a model predicts a transaction is fraud.
It is flagged.
Graph visualisation allows investigators to explore relations and flagged items.
In the Train Fraud Detection Model code snippet below, you can see that you must first curate a labeled training dataset: financial_transactions. With that dataset, you can train the model and then the trained model can then be used on new financial transactions to predict if they are fraud or not-fraud. An alert is sent if a financial transaction is suspected of fraud.
Generative Adversarial Neural Networks (GANs) are a natural choice for financial fraud prediction as they can learn the patterns of lawful transactions from historical data. For every new financial transaction, the model computes an anomaly score; financial transactions with high scores are labeled as suspicious transactions.
GANs have a reputation for being both complex to understand and difficult to train. During the training phase, the generator is trained to mimic real transactions, the encoder learns to recognize what is a real transaction, and the discriminator classifies real and fake data.
As each part of the pipeline improves and is compared with real transactions, the whole system essentially is trained at being better at creating and identifying real transactions. The goal being to get as close as possible to the patterns that can be seen in a real environment.
During the serving phase, both generator and encoder have fixed parameters and the discriminator is discarded. The real transaction is encoded and compared to a reconstruction from the generator and the encoder; the anomaly is what the system interprets as being the difference between what a normal transaction looks like and what a generated transaction looks like. The threshold level for triggering an alert is configurable, and the anomaly score itself can be interpreted by investigators. Currently, deep learning approaches are not approved by regulators for identifying money-laundering, so our approach is currently used as a decision support system, where it runs alongside a classic rules based system, but enables investigators to be more productive by helping them prioritize the investigation of alerts. That is, those with the highest anomaly scores should be investigated first.
GANs are challenging to both train and deploy in production, needing GPUs and parallel hyperparameter search as well as distributed training support when training on large volumes of data.
To detect fraudulent patterns and trigger alerts, you can use graph and tabular features as input features to the GAN techniques described earlier. Graphs consist of nodes and edges. In financial transactions, the nodes represent businesses and individuals, while an edge represents a financial transaction between two nodes.
To show the utility of graphs, here’s an example. Mark the businesses and individuals with different titles: businesses are marked as “Corp” and individuals are marked as “Indiv”. The edges are used to represent transactions with associated dates and amounts and the directed edges represent the direction of transactions.
There are various expected graph patterns, such as a normal scatter pattern, also known as a dandelion, that happens when an organization pays salaries. Such a pattern occurs on certain dates, salaries are relatively fixed, and the money flow is outbound from a single payer. An anomalous scatter pattern has a sudden burst of transactions that has never been seen previously for involved nodes or bidirectional money flows.Figure 5 shows a gather-scatter pattern, where money flows initially inbound to the central node in the month of January. These flows are subsequently outbound to other nodes in the month of February. In the world of money-laundering, this gather-scatter pattern is used to hide the distribution of funds from financial institutions. Similarly, Figure 6 shows a scatter-gather pattern that again has a bidirectional flow of money on different dates. In this case, the source and destination of the money are two different central entities.
Based on tabular features as well as graph features, GAN methods can detect such fraud patterns. Such methods coexist with rule-based techniques to lead to better results, accuracy, and a confusion matrix.
Figure 7 shows the confusion matrix of a financial fraud binary classifier. For problems such as money laundering, false negatives should be weighed significantly higher. Use a variant of the F1 score to evaluate models: precision, recall, and fallout should not be weighted equally.
There are other challenges in detecting money laundering patterns:
We, have published as open source a full end-to-end example for detecting fraud:
The code can be reproduced on any Hopsworks cluster, including managed Hopsworks clusters available on AWS, Microsoft Azure, and on-premises installations of Hopsworks. Hopsworks clusters can manage up to hundreds of GPUs and allocate them to applications on-demand.
Swedbank is the largest financial centre in Scandinavia offering retail banking, asset management, and other financial services for 7 million private customers and 546,000 companies. The company’s main challenge was to increase the detection rate and reduce costs of transactions associated with financial crime. We have helped them to introduce our model-based approach for AML using the Hopsworks platform.
Swedbank employs a rule-based system that generates up to 99 false-positives for every 100 alerts. The financial institution leveraged our deep learning for anomaly detection approach with more than 40TB of training data in the Hopsworks Feature Store, and models trained on GPUs. In pre-production evaluation, the company was able to reduce this to only 1 false-positive for every 2 alerts (99% reduction).