Checking big suffix and LCP arrays by probabilistic methods

Yi WU, Ge NONG, Wai Hong CHAN, Ling Bo HAN

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

For full-text indexing of massive data, the suffix and LCP (longest common prefix) arrays have been recognized as fundamental data structures, and there are at least two needs in practice for checking their correctness, i.e., program debugging and verifying the arrays constructed by probabilistic algorithms. Two probabilistic methods are proposed to check the suffix and LCP arrays of constant or integer alphabets in external memory using a Karp-Rabin fingerprinting technique, where the checking is wrong only with a negligible error probability. The first method checks the lexicographical order and the LCP-value of two suffixes by computing and comparing the fingerprints of their LCPs. This method is general in terms of that it can verify any full or sparse suffix/LCP array of any order. The second method uses less space, it first employs the fingerprinting technique to verify a subset of the given suffix and LCP arrays, from which two new suffix and LCP arrays are induced and compared with the given arrays for verification, where the induced suffix and LCP arrays can be removed for constant alphabets to save space. Copyright © 2017 IEEE.

Original languageEnglish
Pages (from-to)1667-1675
JournalIEEE Transactions on Computers
Volume66
Issue number10
Early online dateMay 2017
DOIs
Publication statusPublished - 2017

Citation

Wu, Y., Nong, G., Chan, W. H., & Han, L. B. (2017). Checking big suffix and LCP arrays by probabilistic methods. IEEE Transactions on Computers, 66(10), 1667-1675. doi: 10.1109/TC.2017.2702642

Keywords

  • Suffix and LCP arrays
  • Verification
  • Karp-rabin fingerprinting
  • External memory

Fingerprint Dive into the research topics of 'Checking big suffix and LCP arrays by probabilistic methods'. Together they form a unique fingerprint.