Please use this identifier to cite or link to this item:
Title: Generating Fixed-Size Training Sets for Large and Streaming Datasets
Authors: Ougiaroglou, Stefanos
Arampatzis, Georgios
Dervos, Dimitris A.
Evangelidis, Georgios
Type: Conference Paper
Subjects: FRASCATI::Natural sciences::Computer and information sciences
Keywords: k-NN Classification
Data Reduction
Prototype Generation
Data Streams
Issue Date: 25-Aug-2017
Volume: 10509
First Page: 88
Last Page: 102
Volume Title: Advances in Databases and Information Systems
Part of Series: Lecture Notes in Computer Science
Part of Series: Lecture Notes in Computer Science
Abstract: The k Nearest Neighbor is a popular and versatile classifier but requires a relatively small training set in order to perform adequately, a prerequisite not satisfiable with the large volumes of training data that are nowadays available from streaming environments. Conventional Data Reduction Techniques that select or generate training prototypes are also inappropriate in such environments. Dynamic RHC (dRHC) is a prototype generation algorithm that can update its condensing set when new training data arrives. However, after repetitive updates, the size of the condensing set may become unpredictably large. This paper proposes dRHC2, a new variation of dRHC, which remedies the aforementioned drawback. dRHC2 keeps the size of the condensing set in a convenient, manageable by the classifier, level by ranking the prototypes and removing the least important ones. dRHC2 is tested on several datasets and the experimental results reveal that it is more efficient and noise tolerant than dRHC and is comparable to dRHC in terms of accuracy.
ISBN: 978-3-319-66916-8
ISSN: 0302-9743
Other Identifiers: 10.1007/978-3-319-66917-5_7
Appears in Collections:Department of Applied Informatics

Files in This Item:
File Description SizeFormat 
2017_ADBIS.pdf315,94 kBAdobe PDFView/Open

This item is licensed under a Creative Commons License Creative Commons