Given a size-n input text T and its suffix array, a new method is proposed to compute the K-order longest common prefix (LCP) array for T, in terms of that the maximum LCP of two suffixes is truncated to be at most K. This method employs a fingerprint function to convert a comparison of two variable-length strings into a comparison of their fingerprints encoded as fixed-size integers. This method takes O (n log K) time and O (n) space on internal and external memory models. It is also scalable for a typical distributed model consisting of d computing nodes, where the time and space complexities are evenly divided onto each node as O (n log K/d) and O (n/d), respectively. For performance evaluation, an experimental study has been conducted on both external memory and distributed models. From our perspective, a cluster of computers in a local area network is commonly available in practice, but there is currently a lack of scalable LCP-array construction algorithm for such a distributed model. Our method provides a candidate solution to meet this demand. Copyright © 2017 Springer Nature Singapore Pte Ltd.
|Title of host publication||Parallel architecture, algorithm and programming: 8th International Symposium, PAAP 2017, Haikou, China, June 17-18, 2017, Proceedings|
|Editors||Guoliang CHEN, Hong SHEN , Mingrui CHEN|
|Place of Publication||Singapore|
|Publication status||Published - 2017|
CitationWu, Y., Han, L. B., Chan, W. H., & Nong, G. (2017). Scalable K-order LCP array construction for massive data. In G. Chen, H. Shen, & M. Chen (Eds.), Parallel architecture, algorithm and programming: 8th International Symposium, PAAP 2017, Haikou, China, June 17-18, 2017, Proceedings (pp. 579-593). Singapore: Springer.
- Fingerprint function
- K-order LCP-array construction
- External memory and distributed models