Please use this identifier to cite or link to this item:
|Title:||Compact Binary: an Efficient Non-parameterized Code for Index Compression|
Dervos, Dimitris A.
|Subjects:||FRASCATI::Natural sciences::Computer and information sciences|
|Volume Title:||Proceedings of the 1st Balkan Conference in Informatics (BCI), Thessaloniki|
|Abstract:||Inverted file indexes are nowadays the most popular method for indexing text databases. Integer number compression codes are applied on theinverted document id lists to produce compact inverted file indexes. A class of index compression codes that are insensitive to the variations in the statistics of dynamic text collections are the non-parameterized codes. In the present study, we introduce compact-binary (cb): a new non-parameterized coding scheme that combines the Golomb code and the binary representation of integers. The performance of the new code is compared to that of existing popular codes. Experimental results obtained from a number of TREC document collections reveal an overall 7,7% improvement over the most efficient of the existing nonparameterized codes. The outcome is backed by analysis and comprises a significant gain when one considers the large sizes of the target text database collections.|
|Appears in Collections:||Department of Applied Informatics |
This item is licensed under a Creative Commons License