Scalable graph similarity search in large graph databases

2015 
We consider the problem of searching a collection of graphs D to find graphs that are most similar to a query graph Q. This has several applications in areas like computational biology, drug design, computational chemistry, collaborative networks, social networks etc. We use graphlet kernel to define similarity between graphs. In order to make the similarity search faster, we build an efficient nearest neighbor data structure on the graph collection using locality sensitive hashing technique. The graphs are embedded into a vector space and these vectors are used to build the nearest neighbor data structure. Computing the vector space embedding is the most compute intensive part in our algorithm. To scale our algorithm to large graph collections, we give an efficient Map-Reduce implementation for vector space embedding. We perform experiments on real world datasets (AIDS dataset) and synthetic datasets to show the effectiveness of our algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    1
    Citations
    NaN
    KQI
    []