Induced sorting suffixes in external memory with better design and less space

Wei Jun LIU, Ge NONG, Wai Hong CHAN, Yi WU

Research output: Chapter in Book/Report/Conference proceedingChapters

Abstract

Recently, several attempts have been made to extend the internal memory suffix array (SA) construction algorithm SA-IS to the external memory model, e.g., eSAIS, EM-SA-DS and DSA-IS. While the developed programs for these algorithms achieve remarkable performance in terms of I/O complexity and speed, their designs are quite complex and their disk requirements remain rather heavy. Currently, the core algorithmic part of each of these programs consists of thousands of lines in C++, and the average peak disk requirement is over 20n bytes for an input string of size (n<2^{40}). We re-investigate the problem of induced sorting suffixes in external memory and propose a new algorithm SAIS-PQ (SAIS with Priority Queue) and its enhanced alternative SAIS-PQ+. Using the library STXXL, the core algorithmic parts of SAIS-PQ and SAIS-PQ+ are coded in around 800 and 1600 lines in C++, respectively. The time and space performance of these two programs are evaluated in comparison with eSAIS that is also implemented using STXXL. In our experiment, eSAIS runs the fastest for the input strings not larger than 16 GiB, but it is slower than SAIS-PQ+ for the only two input strings of 32 and 48.44 GiB. For the average peak disk requirements, eSAIS and SAIS-PQ+ are around 23n and 15n bytes, respectively. Copyright © 2015 Springer International Publishing Switzerland.
Original languageEnglish
Title of host publicationString processing and information retrieval: 22nd international symposium, SPIRE 2015, London, UK, September 1-4, 2015, proceedings
EditorsCostas ILIOPOULOS, Simon PUGLISI, Emine YILMAZ
Place of PublicationSwitzerland
PublisherSpringer International Publishing
Pages83-94
ISBN (Electronic)9783319238265
ISBN (Print)9783319238258
DOIs
Publication statusPublished - 2015

Citation

Liu, W. J., Nong, G., Chan, W. H., & Wu, Y. (2015). Induced sorting suffixes in external memory with better design and less space. In C. Iliopoulos, S. Puglisi, & E. Yilmaz (Eds.), String processing and information retrieval: 22nd international symposium, SPIRE 2015, London, UK, September 1-4, 2015, proceedings (pp. 83-94). Switzerland: Springer International Publishing.

Keywords

  • Suffix array
  • Sorting algorithm
  • External memory

Fingerprint

Dive into the research topics of 'Induced sorting suffixes in external memory with better design and less space'. Together they form a unique fingerprint.