Please use this identifier to cite or link to this item:
Title: Efficient indexing methods in the data mining context
Authors: Kouiroukidis, Nikolaos
Evangelidis, Georgios
Type: Article
Subjects: FRASCATI::Natural sciences::Computer and information sciences
Keywords: Multidimensional Indexing
Data Mining
Issue Date: 2011
Source: International Journal on Integrated Information Management
Abstract: The proliferation of applications manipulating enormous sizes of multidimensional vector data, that require indexing in order to support operations like kNN searching in an efficient manner, has revived the interest in multidimensional indexing. It is well established that as the dimensionality of data increases the performance of queries such as range queries and nearest neighbor (1NN or kNN) queries decreases leading to a problem described as “the curse of dimensionality”. In this paper we point out problems that arise when indexing multidimensional data in fixed dimensions, such as 8 or 16, because, usually, when dealing with data of higher dimensionality it is common to first apply dimensionality reduction techniques such as principal component analysis. Although there is a plethora of research papers proposing multidimensional indexing structures, most of them report empirical results with relatively small and low dimensionality datasets. We attempt a fair comparison of many state of the art indexing structures designed exclusively to index multi-dimensional points like the Hybrid tree, the iDistance and the P+tree. We include in our comparison the R*tree, a state of the art index designed both for multidimensional points and regions. It is an improvement of the well-known R-tree, and also has been revised and improved further recently. We compare the behavior of the indexes on kNN queries (for k=1 and k=10) with varying dataset sizes and dimensionality. Our goal is to determine the structure(s) that could be used efficiently in the area of data mining. We obtained the source code of the implementations of the related structures from the authors and the R-tree portal.
ISSN: 2241-827X
Other Identifiers: 10.15556/IJIIM.01.01.002
Appears in Collections:Department of Applied Informatics

Files in This Item:
File Description SizeFormat 
2011_IJIIM_kouirukidis.pdf160,49 kBAdobe PDFView/Open

This item is licensed under a Creative Commons License Creative Commons