A method to verify originality of sequences secretly on distributed computing environment

2004 
In the field of molecular biology, it is important to find gene sequences related to some phenomena, such as disease and chemical reaction. Once a target gene has been sequenced, it must be confirmed whether the sequence is already known or not in the world. If the sequence is not yet revealed on databases, it is a novel and valuable sequence. In general, this comparison process is done by comparing exact sequence data with each other by using a homology search program. In this case, the exact sequences of not only genomic databases but also newly sequenced genes must be opened in public. Therefore, if we don't like to open the databases and/or the new sequences on public networks, we must purchase them and search in local. We propose a method to verify the originality of gene sequences secretly on public networks. At first, target raw sequences are manipulated to prevent them from being reconstructed. Next, this method hashes all the genomic sequences. Only the processed data are opened on public networks. Finally, the hashed files are compared in parallel to each other by the sorting method that we proposed (Kurata et al., 2003). The hashed files are stored on genomic databases in a distributed form. We describe how to implement this method upon a grid computing environment and show the calculation results on a world-wide grid environment between Japan, Switzerland and France. This method successfully verified the originality of the sequence SSB against E. coli K-12 and B. subtilis.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    3
    Citations
    NaN
    KQI
    []