Identifying data sources and developing AI-based solutions to analyse online fraud
6 November 2024
Research summary
According to the NCA, fraud is the most common crime in the UK, costing the UK public billions of pounds each year. Although the true cost of online fraud is not known, 80% of fraud cases are carried out online (NCA 2023). Online fraud not only causes financial harm to the public and businesses but also causes emotional and reputational harm to individuals and businesses that are hard to assess.
Detecting and preventing online fraud requires an understanding of how online fraud is committed, collecting and sharing appropriate data for analysing, mitigating vulnerabilities that were exploited and providing advice and education to potential victims about how to reduce the risk of victimisation.
Over the years the applications and deployments of Artificial Intelligence (AI)-based solutions have seen an increase due to advancements in AI techniques including machine learning (ML) and its subfield deep learning (DL), used for recognising patterns and making predictions; and Natural language processing (NLP), used for understanding and generating human language. Improvements in the availability of computing power, the development of graphics processing units, large sets of data and open-source libraries for developing ML/DL and NLP models have enabled the successful application of AI across a wide range of application domains including security and crime prevention, from intrusion detection, biometrics identification to detecting cyber grooming and other online harmful behaviour.
Consequently, this powerful technology is also being used as a tool for criminals to design better scams. Criminals are leveraging AI techniques for malicious use and this is true for online fraud too. As was acutely demonstrated during the pandemic, methods of fraud are constantly evolving, and it is predicted that fraud will become ever more challenging due to the emergence of AI techniques such as generative AI (McKinsey 2023) that enable the creation (at scale) of deep fake videos, text, images and sound. Tackling online fraud, therefore, requires the development of effective AI solutions that can be updated in real-time to monitor and defend against evolving threats. Effective and reliable AI models require the collection of high-quality data for training the models.
Content generated on the web, on social network platforms and other online communities and forums, has provided a valuable source of data to identify patterns, categorise activities and measure public opinion on a wide range of topics including crime. Application areas include using social media data to understand scams such as fake consumer reviews (Hu, Liu et al. 2011); corporate fraud (Dong, 2018) such as intentional disclosure of false information (Xiong, 2018); identification of insurance fraud (Diaz-Granados 2015); investigating characteristics of criminals who are using fake profiles on a social media platform to persuade individuals to invest into cryptocurrency scams (Chergarova, 2022).
However, there is a clear gap in this area as there are few studies, and we are far from understanding if and how unstructured text data from the web (e.g. social media conversations, comments on public websites, online communities, forums, etc.) can be utilised to investigate online fraud. Similarly, to the best of our knowledge, AI, in particular Natural Language Processing techniques, have not been applied to fraud data (i.e. self-reports by the public or industry, or narrative reports by police officers) collected by UK police (Action Fraud) and, fraud prevention services such as Cifas, to gain meaningful insights for understanding and preventing fraud.
The objective of this project is to fill this gap and by developing effective AI-based solutions to understand existing and emerging online fraud trends and patterns and help to investigate and detect new modus operandi and prevent online fraud using web data and police data. The project will: i) explore the application of Machine Learning including Deep Learning, and Natural Language Processing, to build solutions to monitor and predict changing trends and patterns related to online fraud; ii) predict new forms of online fraud, their lifecycle and the modus operandi involved; and iii) use this insight to educate victims, detect and prevent future fraud cases. The solutions will be developed with explainability in mind to allow transparency, reproducibility and effective interpretation of the findings.
Lead Investigator(s) |
|
---|---|
Project partner | UCL Centre for Advanced Research Computing |
Outputs |