Please use this identifier to cite or link to this item: https://ruomo.lib.uom.gr/handle/7000/1582
Title: Very fast variations of training set size reduction algorithms for instance-based classification
Authors: Ougiaroglou, Stefanos
Evangelidis, Georgios
Type: Conference Paper
Subjects: FRASCATI::Natural sciences::Computer and information sciences
Keywords: data reduction
prototype generation
RHC
homogeneous clusters
k-NN Classification
Issue Date: May-2023
Publisher: Association for Computing Machinery
First Page: 64
Last Page: 70
Volume Title: International Database Engineered Applications Symposium Conference
Abstract: Reduction through Homogeneous Clustering (RHC) and its editing variant (ERHC) are effective data reduction techniques for the k-NN classifier. They are based on an iterative k-means clustering task that discovers homogeneous clusters. The centers of the resulting homogeneous clusters constitute the instances of the reduced training set. Although RHC and ERHC are quite fast compared to several well-known data reduction techniques, the iterative execution of k-means clustering renders both of them inappropriate for data reduction tasks that need to be performed quickly, especially, when run over large training datasets. The present paper proposes simple and very fast variations of the algorithms, which are appropriate for such environments. The variations are called RHC2 and ERHC2 and replace the complete execution of k-means clustering with a fast task that assigns instances to the class centers. The experimental study based on fourteen datasets, and, the corresponding statistical tests, show that the proposed RHC2 and ERHC2 variations are very fast and, at the cost of a small penalty on classification accuracy, they achieve higher reduction rates than their predecessors and other two well-known data reduction techniques. They are good candidates when fast reduction on large datasets is required.
URI: https://doi.org/10.1145/3589462.3589493
https://ruomo.lib.uom.gr/handle/7000/1582
ISBN: 9798400707445
Other Identifiers: 10.1145/3589462.3589493
Appears in Collections:Department of Applied Informatics

Files in This Item:
File Description SizeFormat 
IDEAS2023_RHC2.pdf168,42 kBAdobe PDFThumbnail
View/Open


This item is licensed under a Creative Commons License Creative Commons