Suffix array construction in external memory using d-critical substrings

Ge NONG, Wai Hong CHAN, Sen ZHANG, Xiao Feng GUAN

Research output: Contribution to journalArticles

18 Citations (Scopus)

Abstract

We present a new suffix array construction algorithm that aims to build, in external memory, the suffix array for an input string of length n measured in the magnitude of tens of Giga characters over a constant or integer alphabet. The core of this algorithm is adapted from the framework of the original internal memory SA-DS algorithm that samples fixed-size d-critical substrings. This new external-memory algorithm, called EM-SA-DS, uses novel cache data structures to construct a suffix array in a sequential scanning manner with good data spatial locality: data is read from or written to disk sequentially. On the assumed externalmemory model with RAM capacity σ((nB)0.5), disk capacity O(n), and size of each I/O block B, allmeasured in log n-bit words, the I/O complexity of EM-SA-DS is O(n/B). This work provides a general cache-based solution that could be further exploited to develop external-memory solutions for other suffix-array-related problems, for example, computing the longest-common-prefix array, using a modern personal computer with a typical memory configuration of 4GB RAM and a single disk. Copyright © 2014 ACM.
Original languageEnglish
Article number1
JournalACM Transactions on Information Systems
Volume32
Issue number1
DOIs
Publication statusPublished - Jan 2014

Citation

Nong, G., Chan, W. H., Zhang, S., & Guan, X. F. (2014). Suffix array construction in external memory using d-critical substrings. ACM Transactions on Information Systems, 32(1), Article 1.

Keywords

  • External memory
  • Sorting algorithm
  • Suffix array

Fingerprint Dive into the research topics of 'Suffix array construction in external memory using d-critical substrings'. Together they form a unique fingerprint.