Leveraging Interindustry Data to Discover Mule Accounts
Guy Sheppard on Secure Data Sharing and How to Leverage Artificial Intelligence Suparna Goswami (gsuparna) • July 8, 2022 20 MinutesHow do you apply artificial intelligence to make sense of data from different industries when determining whether a customer is creditworthy or whether an account is being set up as a mule to facilitate transfers of stolen cash? Guy Sheppard, general manager of financial services at Aboitiz Data Innovation, discusses how his company analyzed data from the power industry to determine the authenticity of an account.
"As a result of the uniqueness of our business, we have power data as well as bank data. So we are able to develop an alternative credit-scoring model for customers," Sheppard says.
"We want fintechs to reach out to us for business, but fintechs have irregular business models. So we determine their credit ability by looking at data points outside of a legacy series of documents. The model was constructed on the basis of power consumption data and that was mapped to get different social and demographic data points for us then to be able to understand their propensity to default."
In an audio interview with Information Security Media Group, Sheppard also discusses:
- Details of the case study;
- How to make the best use of artificial intelligence;
- How to share data while keeping privacy and security intact.
Sheppard previously led SWIFT's Asia-Pacific financial crime compliance, intelligence and cyber teams. He has more than 18 years of experience in financial crime compliance, predominantly in analytics, entity resolution and correspondent banking KYC.
Suparna Goswami: Hi, I'm Suparna Goswami, associate editor with Information Security Media Group. I have the pleasure of speaking with Guy Sheppard, former head of APAC Financial Crime Compliance Initiatives at Swift, and currently the general manager of financial services at Aboitiz Data Innovation (ADI). Guy, we have been talking for the past couple of years now.
Guy Sheppard: We have.
Goswami: Guy, you have moved to Aboitiz. It's more than a 100-year-old conglomerate headquartered in the Philippines. Please tell us about Aboitiz Data Innovation, and briefly about your role here.
Sheppard: Aboitiz Group is 105 years old. Aboitiz has many different verticals. There's Aboitiz Power, the financial services business units include Union Bank, a City Savings Bank, Petnet – with what we hope will be a Union digital – which is in the process of having its digital banking license approved. The story of ADI is unique. It started predictably as a group function and because of its strategic priority as an enabler to the group to become the most technology-enabled conglomerate – a techglomerate by 2025 – ADI is now a separate subsidiary of Aboitiz Equity Ventures. We are based out of Singapore and the Philippines. ADI is a data science/data modeling powerhouse, in short.
Goswami: Guy, as you mentioned, Aboitiz Data Innovation is in the data science and data engineering space. We know that the key to data science starts with data access and data sharing. Aboitiz is a big conglomerate. How are you approaching these issues of data sharing with multiple companies under Aboitiz, without compromising on security or privacy?
Sheppard: This was one of the biggest fundamental challenges for us as a huge conglomerate group of different industries. Industries have different levels of maturity to data and technology. On one end, there is financial services, and industries like construction don't necessarily go hand in hand with data modeling and technology-based innovation.
We have the whole spectrum, and we're in the Philippines. We're not in Singapore. It's a market that still has emerging and developing views on data protection, the use of data and data modeling. The biggest single issue was being able to gain access to sensitive datasets and production data, which could be customer transactional payment information, like pricing, customer segmentation information and various things. The challenges we encountered in the early stages is the usual suspects of requiring data clean up better taxonomy, some universal labeling system, but physically allowing our data scientists to interact with the data product without lengthy legal agreements, and also being able to reassure our business owners that their data was going to be managed and interacted with securely, and they wouldn't lose control. We partnered with a business called Harbor and Amazon Web Services to develop a product we call Parlay, which is secure by design data exchange platform, which fundamentally allows a data product owner perpetual control of who can access their data, and to the point where it's smooth and slick, which helps us manage our internal customers where they can elect different named users to subscribe to that data set and also then enable those privileged users down to the individual. It's not just "if you have an email address @aboitiz.com, you can access it," but named users. At the same time, you can create a secure workspace where you have data modeling/data analysis tools, like Python, or programming languages or SQL. The analogy here is if I went to the Louvre in Paris, and I wanted to see the Mona Lisa, I get to interact with that piece of art, but I don't expect to take it home with me. The challenge with this kind of traditional data sharing agreement is even if I have data tokenization, I still have to trust you, Suparna that after this piece of work is concluded, you will delete and destroy all records or backup records of my data. That's the fundamental premise. Once it's left the institution, you effectively lose control of it. Whereas in our world, we wanted a way to be able to federate access to data, without ever leaving the Secure Enclave, which also through AWS and the different hosting arrangements we have with them, particularly in the Asia Pacific, you still have lots of governments and regulators insisting on data not leaving sovereign shores. There are multiple ways in which we can satisfy those requirements. You have ISO 27001 certification and cyber essentials. It is secure by design and I say this with confidence because I have seen lots of products that quote secure by design and that means you can't ever access anything, you can't interact with anything. However, this is a free-moving user experience, which has enabled us to federate data from all different parts of Aboitiz and create models much faster – on average 50% faster – than before we were using Harbor. We have a faster time to market, we have collaborative tools and workspaces built into our platform and our goal at the moment is to develop this as an ecosystem for the group.
Goswami: The journey or what you sell is absolutely fascinating. There's a lot of learnings for the financial industry on how to share data securely. Can you share some examples for our audience who are from the financial industry or any industry for that matter?
Sheppard: Yeah, I think if I take my Swift background into account, the depth of my experience is in financial crime prevention and the challenge for so many practitioners is around data access. A grim acceptance across the industry is that we have rules-based engines that are inefficient but meet a regulatory expectation of the type of thing you should have bolted in as your core AML kind of toolset and framework. The key to AI and machine learning is access to data, and it has to be production data and the ability to rapidly prototype and then fine tune it because you have machine learning which needs to learn to improve levels of effectiveness, efficiency and accuracy. The key is being able to monetize data. However, the challenge is that regulators are pushing a FinTech and RegTech agenda, whether you're in Hong Kong, Singapore, Mumbai, Tokyo or Beijing. This is a constant; you should be using technology to get faster, smarter and better at this job. The challenge becomes that kind of final mile. If I use the telecoms example of how you fundamentally share your data, with a FinTech or RegTech, third party or even some of your customers, and I think whether it's Harbor or Parlay, that's where you need a secure mechanism, which will satisfy legal and governance procedures to be able to interact and collaborate with these third-party vendors. This has been the steepest learning curve for me. We have a model that we're in the process of deploying at a number of our internal institutions, which is mule account detection. Since we have access to this federated data, we can then use as a baseline all of the previous lead detected or reported mule accounts and we can model that transactional behavior. You almost have like a thumbprint of what the machine knows is historically mule account behavior. Then we can pattern match all of our existing account behaviors against that because we can process this in real time and then we start to get not just a score – because a score is a clue – but what we want to understand are which accounts are behaving differently. That's the first conclusion. There may be positive reasons for that. This may be that this customer has outgrown their existing segmentation and there are revenue uplift opportunities where we can cross-sell additional products to satisfy someone that has matured as a customer. And then those that start, unfortunately, to echo their behaviors that the machine knows are those that appear or could be perceived to be new behaviors. They would then go into a dashboard for much closer inspection and monitoring by our financial crime teams. From there, we have a new series of alerts that are generated. It's going from a reactive, I wouldn't say hit and miss, but a less accurate type of approach to a real-time and dynamic model. The exciting thing is that the model learns. When we first deployed this, we had a 90% or more certainty that an account was not a mule. By inference we had, we were scoring about 80% accuracy in detecting mule behaviors to then further segment to, “is this a true or false positive?” Those percentages have increased exponentially as we continue to train these models with more data. These are additional sub-criteria that would explain that. This is not artificial intelligence like it's locked in a box and we never interact with it. This is an interactive codependent experience. That's what has been the big learning curve for me. This is not some Skynet setup where it's sentient and does what it needs to do. You have to constantly spend time with these models to train and fine-tune them. The win is super clear that we are much better at detecting this type of behavior, which has also led us to use cluster analysis to then start looking at the additional relationships that we see in the data between accounts that have been targeted or are suspected of illicit behavior and who they interact with across the wider network, that starts to get interesting. If I break it down, you have to have fast ready access to production levels of data that make your business comfortable. Without it, everything else is just pie in the sky. Once you've achieved that, you can start to model and interact with your data to identify known unknowns, like we don't know why this is happening. We can start to drill down to granular levels. It's not all doom and gloom. One of the areas that we're proud about is, that as a result of the kind of uniqueness of our business, we have power data. We also have bank data. We've been able to develop an alternative credit scoring model for customers in more emergent or rural parts of the Philippines, which is a problem that is scalable and applicable to the wider parts of Asia. One of our key mission statements at Union Bank is to be the go-to bank for fintechs. That's what we want to do. As we've heard from across the industry, fintechs have irregular business models, which means when they present these accounts to banks to open corporate bank accounts, the rejection rate is high. This is a real challenge. We can score those using non-traditional methods, looking at other data points outside of the legacy series of documents that you would expect to open a bank account, which they may not have or may not meet a risk appetite, to be able to lend with confidence to a much wider range of customers than we would have done previously. That model was constructed of power consumption and then mapped against different social demographic data points for us to then be able to understand a propensity to default. It's a series of problems that you can look at once you have the data from every different angle. That's what makes data science exciting.
Goswami: I know you have worked with Swift for around eight years and have a good understanding of how AI can be leveraged. My previous conversation with you was around AI. Please tell our audience how AI can be used to detect suspicious transactions and how to move from static to say dynamic ruleset.
Sheppard: The big challenge we've kind of touched on is particularly with transaction monitoring. It's a wide spectrum that you're looking for, that this is the original needle in the haystack situation. There are more typologies than we have ever seen in a single point of history from regulators. There is now potential animal trafficking typologies, human trafficking typologies and potential fraud typologies. We're looking for more than we've ever had to look before. The fundamental challenge for a lot of these financial institutions (FIs) is that there is a regulatory expectation that a known vendor or a system with fairly non-complex construction and methodology is going to be something they're going to find when they come knocking for an inspection. We have moved to a space where you have these unwieldy rules-based systems that are spitting out either far too many or not enough alerts; usually it's the former. It's incredibly difficult for FIs to reconfigure and adopt a much more agile way of interacting with their systems. Many of the original programming team or the original onboarding team have long since left the bank. These become sacrosanct holy cows that sit there and no one wants to interact with it. I don't think there's a single compliance practitioner that wouldn't disagree with the fact that the levels of efficiency and transaction monitoring have been deplorable – 99.95% or more than 99% is the usual inefficiency rate. It is a needle in a haystack territory. The difference you get when you move to AI or machine learning systems is twofold. First, you move from a very static, retrospective approach to changing something. We're in a constant state of change, so you need to be agile to respond to that. Besides, you have legacy rules that are going to be spitting out alerts, that the parameters of those have long since changed, but we don't want to switch them off because there was a very sound reason for having deployed them in the first place. I think these complement one another. The industry is working back from is now we have a huge ton of alerts from a system. How do we then prioritize those? Most FIs are already on that page and some of the more forward-thinking FIs are looking at is how do we run these in parallel and start to look at the results. That was a similar experience for us at Union, where we took as a baseline what is the universe of customers that satisfy some but not all of our core alerts. We're still staying true to what we see as the risks across a transactional data set. But which of those that haven't generated a patch should have done, but have slightly different behaviors than those prescribed by our system, which we know at the moment is a source of truth. But it's a view. It's not necessarily all-encompassing. That created some interesting new targets for us to look at. You have to believe that the glass is half full and you have to believe that whoever sat down on day one and constructed those risks that would then generate an alert or flag Guy Sheppard as a person of interest. There was truth behind that, there was thinking and there was a methodology that has stood the test of time. To prescribe throwing that all out the window and starting from a blank page is disrespectful and inefficient. We're more about a process of improving on what we know is inefficient to start to target parts of our population in terms of transactions that may have just changed behaviors that are still in that risk bracket, but are not showing up on an alert. We're trying to point the torch in a different part of the room.
Goswami: Guy, fascinating conversation. Amazing to know how within the conglomerate, you are able to correlate data from one industry, which is completely different, to a power industry, or a construction industry, to financial industry. Thank you so much. I'm sure there's a lot of learning for the financial industry or any industry out there.
Sheppard: Thank you very much, Suparna.
Goswami: You were listening to Guy Sheppard. For ISMG, this is Suparna Goswami. Thank you so much for listening.