Top-k Queries over Distributed Uncertain Categorical Data

2020 
Uncertain data arises in many modern applications including sensor networks, data integration, and information extraction. Often this data is distributed and there is a need to do efficient query processing over the data in situ. We focus on answering top-k queries and propose a distributed algorithm TDUD, to efficiently answer top-k queries over distributed uncertain categorical data in queries single round of communication. TDUD uses a distributed index structure composed of local uncertain indexes (LUIs) on local sites and a single global uncertain index (GUI) on a coordinator site. Our algorithm minimizes the amount of communication needed to answer a top-k query by maintaining the mean sum dispersion of the probability distribution on each site. Extensive experiments are conducted to verify the effectiveness and efficiency of the proposed methods in terms of communication costs and response time. We show empirically that TDUD is near-optimal in that it can typically retrieve the top-k query answers by communicating only k tuples in a single round.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []