A Review on Oversampling Techniques for Solving the Data Imbalance Problem in Classification

Tharinda Dilshan Piyadasa; Kasun Gunawardana

PDF

Published Jun 13, 2023

Tharinda Dilshan Piyadasa

University of Colombo School of Computing

Kasun Gunawardana

University of Colombo School of Computing

Abstract

The data imbalance problem is a widely explored area in the Machine Learning domain. With the rapid advancement of computing infrastructure and the incessant increase in the amount and variety of data generated, the data imbalance problem has prevailed and reshaped with the requirement for novel approaches to address it. Among the different approaches that exist to address the data imbalance problem, such as data-level and algorithmic-level, data-level approaches are more popular among the scientific community due to their classifier-independent nature. When investigating current trends in data-level approaches, it is evident that oversampling is a technique frequently explored due to its adaptability to scenarios where extreme data imbalance is present. This paper presents a review of different oversampling techniques with a comprehensive analysis of the strategies that have been used along with possible areas that looks promising to explore further to develop more advanced oversampling techniques.

Issue

Vol 16 No 1 (2023): 2023 March Issue

Select the Journal Issue

Articles

Article Sidebar

Main Article Content

Abstract

Article Details