Please use this identifier to cite or link to this item:
Title: The Effect of Parallelism on Data Reduction
Authors: Ponos, Pavlos
Ougiaroglou, Stefanos
Evangelidis, Georgios
Type: Conference Paper
Subjects: FRASCATI::Natural sciences::Computer and information sciences
Keywords: k-NN Classification
Data Reduction
Prototype Merging
Parallel Implementation
Issue Date: 26-Sep-2019
First Page: 1
Last Page: 4
Volume Title: Proceedings of the 9th Balkan Conference on Informatics
Abstract: In this paper, we investigate the effect of parallelism on two data reduction algorithms that use k-Means clustering in order to find homogeneous clusters in the training set. By homogeneous, we refer to clusters where all instances belong to the same class label. Our approach divides the training set into subsets and applies the data reduction algorithm on each separate subset in parallel. Then, the reduced subsets are merged back to the final reduced set. In our experimental study, we split the datasets into 8, 16, 32 and 64 subsets. The results obtained reveal that parallelism can achieve very low preprocessing costs. Also, when the number of subsets is high, in some datasets the accuracy of k-NN classification is almost equal (if not better) to the one achieved when using the standard execution of the reduction algorithms, with a small loss in the reduction rate.
ISBN: 9781450371933
Other Identifiers: 10.1145/3351556.3351584
Appears in Collections:Department of Applied Informatics

Files in This Item:
File Description SizeFormat 
2019_BCI_POE.pdf594,39 kBAdobe PDFView/Open

This item is licensed under a Creative Commons License Creative Commons