Please use this identifier to cite or link to this item: https://ruomo.lib.uom.gr/handle/7000/344
Title: Efficient editing and data abstraction by finding homogeneous clusters
Authors: Ougiaroglou, Stefanos
Evangelidis, Georgios
Subjects: FRASCATI::Engineering and technology
Issue Date: Apr-2016
Source: Annals of Mathematics and Artificial Intelligence
Volume: 76
Issue: 3-4
First Page: 327
Last Page: 349
Abstract: The efficiency of the k-Nearest Neighbour classifier depends on the size of the training set as well as the level of noise in it. Large datasets with high level of noise lead to less accurate classifiers with high computational cost and storage requirements. The goal of editing is to improve accuracy by improving the quality of the training datasets. To obtain such datasets, editing removes noise and mislabeled data as well as smooths the decision boundaries between the discrete classes. On the other hand, prototype abstraction aims to reduce the computational cost and the storage requirements of classifiers by condensing the training data. This paper proposes an editing algorithm called Editing through Homogeneous Clusters (EHC). Then, it extends the idea by introducing a prototype abstraction algorithm that integrate the EHC mechanism and is capable of creating a small noise-free representative set of the initial training data. This algorithm is called Editing and Reduction through Homogeneous Clusters (ERHC). Both are based on a fast and parameter free iterative execution of k-means clustering that forms homogeneous clusters. Both consider as noise and remove clusters consisting of a single item. In addition, ERHC summarizes the items of the remaining clusters by storing the mean item for each one in the representative set. EHC and ERHC are tested on several datasets. The results show that both run very fast and achieve high accuracy. In addition, ERHC achieves high reduction rates.
URI: https://doi.org/10.1007/s10472-015-9472-8
https://ruomo.lib.uom.gr/handle/7000/344
ISSN: 1012-2443
1573-7470
Other Identifiers: 10.1007/s10472-015-9472-8
Appears in Collections:Department of Applied Informatics

Files in This Item:
File Description SizeFormat 
AMAI.pdf747,24 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.