SOM-XG: Self-Organizing Map Based Resampling with Sample Extraction and Generation

Tharinda Dilshan Piyadasa; Kasun Gunawardana

PDF

Published Feb 9, 2024

Tharinda Dilshan Piyadasa

University of Colombo School of Computing

A/172/23, Gangani Gardens, Avissawella

Kasun Gunawardana

University of Colombo School of Computing

https://orcid.org/0000-0002-6458-1874

Abstract

The “Data Imbalance Problem” is a well-defined and challenging problem in the Machine Learning domain addressed throughout the past decades. With the emergence of Big Data, addressing the data imbalance has reemerged as a trending topic because traditional solutions for this problem are inadequate with the increasing volume and dimensionality of data. There exist a wide range of solutions, from data-level to algorithmic-level, proposed to address the data imbalance problem. Among these approaches, data-level approaches are popular among the scientific community because of their inherent classifier independence, making them generalizable over many different domains. Oversampling is one such data-level technique frequently explored by researchers, especially in extreme imbalance scenarios. This study introduces SOM-XG, an oversampling technique capable of addressing even the extreme imbalance scenarios. The proposed technique utilizes two Self-Organizing Maps and exploits their properties to address the within and between class imbalances and the decision boundary preservation, generating new synthetic samples that are topologically similar to the original samples in the dataset. The empirical results obtained for datasets with imbalance ratios ranging from 1.38 to 130, the number of features ranging from 3 to 300, and the number of samples ranging from 150 to 145,751, oversampled using SOM-XG, demonstrate enhanced classification results while consistently outperforming other state-of-the-art techniques.

Issue

Vol 16 No 4 (2023): 2023 December Issue

Select the Journal Issue

Articles


CodeGen Industry Sponsors	University of Colombo School of Computing Managed & Published

Article Sidebar

Main Article Content

Abstract

Article Details