Main Article Content
The “Data Imbalance Problem” is a well-defined and challenging problem in the Machine Learning domain addressed throughout the past decades. With the emergence of Big Data, addressing the data imbalance has reemerged as a trending topic because traditional solutions for this problem are inadequate with the increasing volume and dimensionality of data. There exist a wide range of solutions, from data-level to algorithmic-level, proposed to address the data imbalance problem. Among these approaches, data-level approaches are popular among the scientific community because of their inherent classifier independence, making them generalizable over many different domains. Oversampling is one such data-level technique frequently explored by researchers, especially in extreme imbalance scenarios. This study introduces SOM-XG, an oversampling technique capable of addressing even the extreme imbalance scenarios. The proposed technique utilizes two Self-Organizing Maps and exploits their properties to address the within and between class imbalances and the decision boundary preservation, generating new synthetic samples that are topologically similar to the original samples in the dataset. The empirical results obtained for datasets with imbalance ratios ranging from 1.38 to 130, the number of features ranging from 3 to 300, and the number of samples ranging from 150 to 145,751, oversampled using SOM-XG, demonstrate enhanced classification results while consistently outperforming other state-of-the-art techniques.
Select the Journal Issue