Machine Learning With Engineered Features To Identify Fraud In Point-Of-Sale Systems | Open Acess

Recently Published Thesis/Dissertation on Machine Learning | 17.12.2019

The majority of crimes against companies are committed by their own employees. Independently run businesses, such as family-owned restaurants, are especially susceptible because they lack the resources to put systems in place to detect and prevent crimes. This dissertation applies machine learning (ML) to detecting server fraud in restaurant point-of-sale (POS) systems to determine if it is feasible to develop a low-cost automated solution that could be used by restaurant owners.

There has been active research in automated fraud detection in a number of sectors, such as credit card, finance, insurance, and retail business, yet no such research has been conducted on POSrestaurant business. The results presented in the papers for each of these types of fraud makes it clear that while there are similarities in fraud detection across the data domains such as having imbalanced data, the features and marginal distributions of each type of data make the domains more different than they are similar. Therefore, the determination of the best algorithms, methods, and techniques for applying ML requires research unique to each fraud domain.

Restaurant servers are an example of an insider threat to the security of restaurant financial data. This paper applies machine learning to detect the digital representation of the malevolent behavior of restaurant employees. The results of this research could be used to notify restaurant owners in real-time when fraud is being committed. This paper applies machine learning (ML) techniques including neural networks, support vector machines, Random Forest, and Adaboost, to detecting insider fraud in restaurant point-of-sales data. By developing engineered features, and applying undersampling and oversampling class balancing techniques, and statistically generated data we show that ML techniques can improve fraud detection performance. In particular, detection with a Random Forest model using cross-validation can be increased by 55% by oversampling the minority class to the same size as the majority class. And results with a Neural Net model trained to detect fraud in the first year the restaurant opened and tested on data from the following year can be improved by 50% by decreasing the majority class to be the same size as the minority class. We show that with statistically generated data, the performance of preprocessing features matches the performance of engineered features and achieve 99.9% F1-Scores.

By Christine G. Hines