Please use this identifier to cite or link to this item:
Title: PCA-based Similarity Search: Pre-processing & Distance Measures
Authors: Karamitopoulos, Leonidas
Evangelidis, Georgios
Type: Conference Paper
Subjects: FRASCATI::Natural sciences::Computer and information sciences
Issue Date: 2007
First Page: 318
Last Page: 327
Volume Title: Proceedings of the 2nd International Scientific Conference, eRA: The Contribution of Information Technology to Science, Economy, Society and Education, Athens, Greece
Abstract: Time series appear frequently in several domains such as in multimedia, business, industry or medicine. A multivariate time series dataset is a set of co-evolving time series that relates to a specific object (e.g. the motion of a person). The increasing need for analyzing efficiently the huge amount of this information leads to the application of data mining techniques. At the core of these techniques lies the concept of similarity since most of them require searching for similar patterns, such as in query by content, clustering or classification. Nevertheless, when dealing with multivariate time series datasets, similarity should be sought between the whole datasets and not only between the individual time series, since there are usually important correlations among them that shouldn’t be lost. In this paper, we discuss the application of Principal Component Analysis (PCA) on multivariate time series datasets for the purpose of similarity search. PCA is applied in order to reduce the high dimensionality of such data while retaining as much as possible of the variation present in the data. We provide a thorough description of the pre-processing phase with respect to PCA assumptions and limitations, as well as, to the most frequently appeared distortions in data. Furthermore, we experimentally explore the potential usefulness of incorporating Piecewise Aggregate Approximation into this phase. Finally, we discuss the various aspects of the proposed PCA-based similarity (dissimilarity) measures.
Appears in Collections:Department of Applied Informatics

Files in This Item:
File Description SizeFormat 
2007_ERA_Karamitopoulos.pdf191,6 kBAdobe PDFThumbnail

This item is licensed under a Creative Commons License Creative Commons