Density Based Query By Committee - Robust Active Learning Approach for Data Streams
Main Article Content
Abstract
—Acquiring precise labels for continuously flowing data streams is resource-intensive and costly. Active learning offers a potential strategy for training precise models while minimizing label requirements with minimal annotation effort. However, adapting active learning to streaming data becomes intricate due to the ever-changing data distributions, known as concept drift. Existing approaches for active learning in data streams predominantly rely on uncertainty sampling and Query by Committee (QBC) due to their simplicity and ease of implementation. This paper introduces a novel and a robust active learning approach tailored for data streams merging key elements from QBC and density-weighted sampling to effectively address the challenges posed by concept drift. Through a comprehensive analysis using benchmark datasets widely used in the literature related to data streams, we demonstrate the superior performance of our proposed method across various data stream scenarios. This includes instances with no concept drift, instances with the presence of concept drift, as well as scenarios involving severe concept drift. In addition, the results reveal that strategies based on uncertainty sampling and its variants exhibit limitations in the presence of concept drift, whereas QBC and its variants prove to be inadequate when faced with significant concept drift. In contrast, our approach, which combines the strengths of QBC and density-weighted sampling using Gower’s distance as a similarity measure, exhibits remarkable adaptability to evolving data distributions.