Consensus Embeddings for Networks with Multiple Versions

2021 
Machine learning applications on large-scale network-structured data commonly encode network information in the form of node embeddings. Network embedding algorithms map the nodes into a low-dimensional space such that the nodes that are “similar” with respect to network topology are also close to each other in the embedding space. Many real-world networks that are used in machine learning have multiple versions that come from different sources, are stored in different databases, or belong to different parties. Due to efficiency or privacy concerns, it may be desirable to compute consensus embeddings for the integrated network directly from the node embeddings of individual versions, without explicitly constructing the integrated network. Here, we systematically assess the potential of consensus embeddings in the context of processing link prediction queries on user-chosen combinations of different versions of a network. For the computation of consensus embeddings, we use linear (singular value decomposition) and non-linear (variational auto-encoder) dimensionality reduction methods. Our results on a large selection of protein-protein interaction (PPI) networks (eight versions with 255 potential combinations) show that consensus embeddings enable real-time processing of link prediction queries on user-defined combinations of networks, without requiring explicit construction of the integrated network. We observe that linear dimensionality reduction delivers better accuracy and higher efficiency than non-linear dimensionality reduction. We also observe that the performance of consensus embeddings is amplified with increasing number of networks in the database, demonstrating the scalability of consensus embeddings to growing numbers of network versions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    1
    Citations
    NaN
    KQI
    []