Unlocking Data Sharing Potential: Introducing the New Online Course on Synthetic Data Generation

16 June 2023

Customs Administrations worldwide have recognized the significance of advanced data analytics algorithms, such as AI HS and DATE, to enhance operational efficiency and performance. With the growing interest in data sharing among Members, the World Customs Organization (WCO) is pleased to announce the release of a new online course on Synthetic Data Generation. This course, developed by the BACUDA project team with funding from CCF-Korea, aims to address the challenges of data sharing while preserving data security and privacy.

Data sharing plays a pivotal role in enabling effective analytics, where the volume of data contributes to the accuracy of models. The WCO's Working Group on Data and Statistics has identified data sharing as a core exercise; however, concerns about privacy and security have hindered widespread adoption. Recognizing this, experts from the BACUDA project introduced a solution for creating synthesized datasets during a hands-on session in Dec. 2022 at the prestigious PICARD conference, receiving positive feedback from participants worldwide.

Building upon the insights gained from the session, the BACUDA project team has developed a comprehensive online course on the WCO E-Learning platform CLiKC!, that allows learners to delve into the intricacies of synthetic data generation. Through interactive lessons, participants can explore the Python code and apply it to their own datasets, gaining practical experience in this innovative technique. Traditional data anonymization and pseudonymization methods often compromise the statistical value or data privacy and security. Synthetic data generation overcomes these challenges by creating entirely new data based on statistical features extracted from the original dataset. This approach eliminates the risk of reidentification and preserves data integrity, ensuring the utmost privacy and security for sensitive information.

Within the course, learners will encounter CTGAN, a powerful method for generating virtual data. CTGAN utilizes Generative Adversarial Networks (GANs) to produce data within desired distributions. Leveraging the user-friendly Python library and Google Colaboratory, participants can easily generate synthetic data without the need for additional software installation. The cloud-based infrastructure equipped with dedicated CPU and GPU resources ensures accessibility for all learners.

The course helps you understand how to evaluate the quality of generated data. This is done by looking at unique values, keeping statistical characteristics consistent using correlation analysis, and comparing performance indicators in analysis projects. By leveraging these techniques, participants can assess the suitability and effectiveness of synthetic data for their analytical needs. By empowering Customs administrations with the knowledge and skills to utilize synthetic data generation, the WCO aims to streamline the analysis of Customs data and foster more active exchanges among Member countries. This breakthrough technique holds immense potential for facilitating data sharing while upholding data security and privacy.

The WCO invites its Members to embark on this transformative online course on Synthetic Data Generation. By participating in this unique learning opportunity, Customs administrations can unlock the full potential of data sharing, ensuring more accurate analytics, enhanced efficiency, and a secure environment for handling sensitive information. Join us in this forward-thinking initiative as we shape the future of data analytics in the Customs domain.

Technical support

For further customized support, WCO invites Members to visit the new BACUDA website (bacuda.wcoomd.org) or contact the WCO BACUDA project team (bacuda@wcoomd.org). Updates relating to the BACUDA project will be published on the BACUDA website.