An analytic model for fraud detection and advanced online course

17 June 2022

The WCO is pleased to announce that the LITE DATE model for fraud detection with the advanced Data Analytics course is now available in English and French on CLiKC as of 17th June, supported by the Customs Cooperation Fund of Korea (CCF-K).

In line with the WCO’s focus on data analytics for Customs purposes, the BACUDA project continues to deliver open-source algorithms according to Members’ needs and accompanying capacity building activities.

The latest addition is the LITE DATE model with the source code and explanations included in the Advanced Data Analytics course on the CLiKC! E-Learning platform.

LITE DATE, easy and simple to use  

LITE DATE is a lite version of the DATE model developed in 2020 as part of the BACUDA project, in the sense that the structure of the model has been simplified for wider use without losing its functionality. After the DATE model's publication and the Intermediate course's launch, which covered the algorithm and source code of the DATE model, Members expressed their need to address the model's difficulty due to its requirements for complex programming techniques, libraries, and GPU.    

To this effect, the LITE DATE model was developed in response to that feedback and recommendations to make it an easy model to practice, understand and execute without a high-performing GPU.

While the original DATE model aims to detect fraud by predicting the amount of additional revenue, the LITE DATE model performs only the first task among the aforesaid functions.

Key aspects of LITE DATE  

The objective of the LITE DATE model is to detect fraudulent transactions. Technically speaking, it is a binary classification model that aims to classify the given transaction either in the class of illegal transactions or in the class of non-illegal transactions. The XGBoost Classifier was used for the binary classification, which is a tree-based ensemble algorithm that uses the boosting method.

The data used for the training model is the synthetic Customs declaration dataset created and shared by the project team. The dataset comprises 100,000 synthetic samples of Customs declarations data labeled with the synthetic inspection results.

In terms of performance, the model shows a recall of 79.9% and a precision of 20.8% (Recall is a metric commonly used to accurately evaluate fraud detection of AI).

According to the results, it is expected that the model can detect about 80% of the entire illicit transactions and reduce the number of transactions subject to human inspection to one-third while tripling its efficacy from 7% to 20.8%.

For further customized support and any questions on the course, WCO invites Members to contact the WCO BACUDA project team (