For full-text indexing of massive data, the suffix and LCP (longest common prefix) arrays have been recognized as fundamental data structures, and there are at least two needs in practice for checking their correctness, i.e., program debugging and verifying the arrays constructed by probabilistic algorithms. Two probabilistic methods are proposed to check the suffix and LCP arrays of constant or integer alphabets in external memory using a Karp-Rabin fingerprinting technique, where the checking is wrong only with a negligible error probability. The first method checks the lexicographical order and the LCP-value of two suffixes by computing and comparing the fingerprints of their LCPs. This method is general in terms of that it can verify any full or sparse suffix/LCP array of any order. The second method uses less space, it first employs the fingerprinting technique to verify a subset of the given suffix and LCP arrays, from which two new suffix and LCP arrays are induced and compared with the given arrays for verification, where the induced suffix and LCP arrays can be removed for constant alphabets to save space. Copyright © 2017 IEEE.
CitationWu, Y., Nong, G., Chan, W. H., & Han, L. B. (2017). Checking big suffix and LCP arrays by probabilistic methods. IEEE Transactions on Computers, 66(10), 1667-1675. doi: 10.1109/TC.2017.2702642
- Suffix and LCP arrays
- Karp-rabin fingerprinting
- External memory