Data structures for categorical path counting queries

2022 
Consider an ordinal tree on nodes, each of which is assigned a from an alphabet . We preprocess the tree in order to support , which ask for the number of distinct categories occurring on the path in between two query nodes and . For this problem, we propose a linear-space data structure with query time , where is the word size in the word-RAM. As shown in our proof, from the assumption that matrix multiplication cannot be solved in time polynomially faster than cubic (with only combinatorial methods), our result is optimal, save for polylogarithmic speed-ups. For a trade-off parameter , we propose an - word, query time data structure. We also consider - approximate categorical path counting queries, which must return an approximation to the number of distinct categories occurring on the query path, by counting each such category at least once and at most times. We describe a linear-space data structure that supports 2- approximate categorical path counting queries in time.Next, we generalize the categorical path counting queries to weighted trees. Here, a query specifies two nodes and an orthogonal range . The answer to thus formed query is the number of distinct categories occurring on the path from to , if only the nodes with weights falling inside are considered. We propose an - word data structure with query time, or an - word data structure with query time. For an appropriate choice of the trade-off parameter , this implies a linear-space data structure with query time. We then extend the approach to the trees weighted with vectors from , where is a constant integer greater than or equal to 2. We present a data structure with words of space and query time. For an - space solution, one thus has query time.The inherent difficulty revealed by the lower bound we proved motivated us to consider data structures based on . In unweighted trees, we propose a sketching data structure to solve the approximate categorical path counting problem which asks for a - approximation (i.e. within of the true answer) of the number of distinct categories on the given path, with probability , where are constants. The data structure occupies words of space, for the query time of . For trees weighted with - dimensional weight vectors (), we propose a data structure with words of space and query time.All these problems generalize the corresponding categorical range counting problems in Euclidean space , for respective , by replacing one of the dimensions with a tree topology.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []