Induced sorting suffixes in external memory with better design and less space

Wei Jun LIU, Ge NONG, Wai Hong CHAN, Yi WU

Research output: Contribution to conferencePaper

9 Citations (Scopus)

Abstract

Recently, several attempts have been made to extend the internal memory suffix array (SA) construction algorithm SA-IS to the external memory model, e.g., eSAIS, EM-SA-DS and DSA-IS. While the developed programs for these algorithms achieve remarkable performance in terms of I/O complexity and speed, their designs are quite complex and their disk requirements remain rather heavy. Currently, the core algorithmic part of each of these programs consists of thousands of lines in C++, and the average peak disk requirement is over 20n bytes for an input string of size (n<2^{40}). We re-investigate the problem of induced sorting suffixes in external memory and propose a new algorithm SAIS-PQ (SAIS with Priority Queue) and its enhanced alternative SAIS-PQ+. Using the library STXXL, the core algorithmic parts of SAIS-PQ and SAIS-PQ+ are coded in around 800 and 1600 lines in C++, respectively. The time and space performance of these two programs are evaluated in comparison with eSAIS that is also implemented using STXXL. In our experiment, eSAIS runs the fastest for the input strings not larger than 16 GiB, but it is slower than SAIS-PQ+ for the only two input strings of 32 and 48.44 GiB. For the average peak disk requirements, eSAIS and SAIS-PQ+ are around 23n and 15n bytes, respectively.
Original languageEnglish
Publication statusPublished - Sep 2015

Citation

Liu, W. J., Nong, G., Chan, W. H., & Wu, Y. (2015, September). Induced sorting suffixes in external memory with better design and less space. Paper presented at the 22nd edition of the International Symposium on String Processing and Information Retrieval (SPIRE 2015), King's College, London, UK

Keywords

  • Suffix array
  • Sorting algorithm
  • External memory

Fingerprint Dive into the research topics of 'Induced sorting suffixes in external memory with better design and less space'. Together they form a unique fingerprint.