A Review on Oversampling Techniques for Solving the Data Imbalance Problem in Classification

Main Article Content

Tharinda Dilshan Piyadasa
Kasun Gunawardana

Abstract

The data imbalance problem is a widely explored area in the Machine Learning domain. With the rapid advancement of computing infrastructure and the incessant increase in the amount and variety of data generated, the data imbalance problem has prevailed and reshaped with the requirement for novel approaches to address it. Among the different approaches that exist to address the data imbalance problem, such as data-level and algorithmic-level, data-level approaches are more popular among the scientific community due to their classifier-independent nature. When investigating current trends in data-level approaches, it is evident that oversampling is a technique frequently explored due to its adaptability to scenarios where extreme data imbalance is present. This paper presents a review of different oversampling techniques with a comprehensive analysis of the strategies that have been used along with possible areas that looks promising to explore further to develop more advanced oversampling techniques.

Article Details

Select the Journal Issue
Articles