Fundamental Algorithms for Suffix Array

    Project: Other project

    Project Details


    Suffix arrays—a space-efficient alternative of suffix trees, were initially proposed in 1990 by Manber and Myers in SODA [21] and SICOMP [22]. For an n-character text S, its suffix array SA(S) is an array of pointers, which requires O(nlogn)-bit space, for all the suffixes in the S sorted in the lexicographically ascending order. Since then, suffix arrays have been employed widely in data indexing, retrieving, sorting and processing [6, 9]. As suffix arrays are used in a broad range of applications such as compression, string matching and computational biology [12], constructing suffix arrays generally constitutes the foundation for subsequent tasks. Recently, it has been observed that many applications for suffix arrays involve huge input of texts that are measured in billions of characters (e.g. biology genome database) [13, 24, 31]. This observation motivates the currently intensive research on time-and-space efficient suffix array construction algorithm (SACAs).
    In this project, we will not only continue our previous work on developing a fast and space-efficient linear SACA, but also investigate some major application techniques such as algorithmic simple compressed suffix array storage, string matching and data mining based on compressed SA.
    Effective start/end date01/02/0831/01/10