Index Compression vs. Retrieval Time of Inverted Files for XML Documents
Norbert Fuhr
Norbert Gövert
Technical Report
University of Dortmund

Query languages for retrieval of XML documents allow for conditions referring to the content as well as to the structure of documents. In order to process these queries efficiently, inverted files must contain also structural information, thus leading to index sizes that exceed the storage space of the original data. In this paper, we investigate two different approaches for reducing index space. Besides compression of index entries, we develop a new data structure called XS tree which contains the structural description of a document in a rather compact form, such that these descriptions can be kept in main memory. We investigate several variants of these two approaches on two large XML document collections. Results show that very high compression rates for indexes can be achieved. However, any compression increases retrieval time. Thus, retrieval time is minimized when uncompressed indexes are used. On the other hand, highly compressed indexes may be feasible for specific applications.

