Please use this identifier to cite or link to this item:
|Efficient dataset size reduction by finding homogeneous clusters
|FRASCATI::Natural sciences::Computer and information sciences
|Proceedings of the Fifth Balkan Conference in Informatics on - BCI '12
|Although the k-Nearest Neighbor classifier is one of the most widely-used classification methods, it suffers from the high computational cost and storage requirements it involves. These major drawbacks have constituted an active research field over the last decades. This paper proposes an effective data reduction algorithm that has low preprocessing cost and reduces storage requirements while maintaining classification accuracy at an acceptable high level. The proposed algorithm is based on a fast pre-processing clustering procedure that creates homogeneous clusters. The centroids of these clusters constitute the reduced training-set. Experimental results, based on real-life datasets, illustrate that the proposed algorithm is faster and achieves higher reduction rates than three known existing methods, while it does not significantly reduce the classification accuracy.
|Appears in Collections:
|Department of Applied Informatics
This item is licensed under a Creative Commons License