An algorithm for suffix stripping
8,177
Citation
4
Reference
10
Related Paper
Citation Trend
Abstract:
The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each step the removal of the suffix is made to depend upon the form of the remaining stem, which usually involves a measure of its syllable length.Keywords:
Stripping (fiber)
Suffix array
Generalized suffix tree
Compressed suffix array
SIMPLE algorithm
Cite
Generalized suffix tree
Compressed suffix array
Suffix array
Cite
Citations (26)
Generalized suffix tree
Compressed suffix array
Suffix array
Sequence (biology)
Cite
Citations (19)
Generalized suffix tree
Compressed suffix array
Suffix array
Linear space
Cite
Citations (182)
We are proposing the genome indexing algorithm, which depends upon compressed form of suffix trees, in which every node has four parts; suffix array number, suffix start number, LCP count, and a pointer to another node. The proposed algorithm does not use the whole suffix array, it just takes some necessary information like LCP of two suffix array, compare them and suffix start number, to align the suffix to proper position and suffix array number to distinguish among all the partitions. The use of compressed suffix array minimizes the number of trees, eventually; it also minimizes the random access to input data, as it creates the compressed suffix tree for two suffix arrays using pairwise sorting, sequentially.
Compressed suffix array
Generalized suffix tree
Suffix array
Cite
Citations (0)
To perform fast searching in massive data such as DNA strings, the most efficient method is to construct full-text index data structures of given strings. The widely used full-text index structures are suffix trees and suffix arrays. Since the suffix may uses less space than the suffix tree, the suffix array is proper for DNA strings. Previously developed construction algorithms of suffix arrays are not suitable for DNA strings since those are designed for integer alphabets. We propose a fast algorithm to construct suffix arrays on DNA strings whose alphabet sizes are fixed by 4. We reduce the construction time by improving encoding and merging steps on Kim et al.[1]'s algorithm. Experimental results show that our algorithm constructs suffix arrays on DNA strings 1.3-1.6 times faster than Kim et al.'s algorithm, and also for other algorithms in most cases.
Compressed suffix array
Generalized suffix tree
Suffix array
Cite
Citations (0)
Generalized suffix tree
Compressed suffix array
Suffix array
Tree (set theory)
Cite
Citations (123)
Generalized suffix tree
Compressed suffix array
Suffix array
Cite
Citations (220)
Suffix trees and suffix arrays are widely used and largely interchangeable index structures on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space efficiency while theoreticians use suffix trees due to linear-time construction algorithms and more explicit structure. We narrow this gap between theory and practice with a simple linear-time construction algorithm for suffix arrays. The simplicity is demonstrated with a C++ implementation of 50 effective lines of code. The algorithm is called DC3, which stems from the central underlying concept of difference cover . This view leads to a generalized algorithm, DC, that allows a space-efficient implementation and, moreover, supports the choice of a space--time tradeoff. For any v ∈ [1, √n ], it runs in O( vn ) time using O( n / √v ) space in addition to the input string and the suffix array. We also present variants of the algorithm for several parallel and hierarchical memory models of computation. The algorithms for BSP and EREW-PRAM models are asymptotically faster than all previous suffix tree or array construction algorithms.
Compressed suffix array
Generalized suffix tree
Suffix array
Cite
Citations (397)
Compressed suffix array
Generalized suffix tree
Suffix array
Cite
Citations (73)
Generalized suffix tree
Compressed suffix array
Suffix array
Lexicographical order
Tree (set theory)
Edit distance
Cite
Citations (23)