Data synchronization

Data synchronization is the process of establishing consistency among data from a source to a target data storage and vice versa and the continuous harmonization of the data over time. It is fundamental to a wide variety of applications, including file synchronization and mobile device synchronization e.g., for PDAs.Synchronization can also be useful in encryption for synchronizing Public Key Servers. Data synchronization is the process of establishing consistency among data from a source to a target data storage and vice versa and the continuous harmonization of the data over time. It is fundamental to a wide variety of applications, including file synchronization and mobile device synchronization e.g., for PDAs.Synchronization can also be useful in encryption for synchronizing Public Key Servers. There are tools available for file synchronization, version control (CVS, Subversion, etc.), distributed filesystems (Coda, etc.), and mirroring (rsync, etc.), in that all these attempt to keep sets of files synchronized. However, only version control and file synchronization tools can deal with modifications to more than one copy of the files. Several theoretical models of data synchronization exist in the research literature, and the problem is also related to the problem of Slepian–Wolf coding in information theory. The models are classified based on how they consider the data to be synchronized. The problem of synchronizing unordered data (also known as the set reconciliation problem) is modeled as an attempt to compute the symmetric difference S A ⊕ S B = ( S A − S B ) ∪ ( S B − S A ) {displaystyle S_{A}oplus S_{B}=(S_{A}-S_{B})cup (S_{B}-S_{A})} between two remote sets S A {displaystyle S_{A}} and S B {displaystyle S_{B}} of b-bit numbers. Some solutions to this problem are typified by: In this case, two remote strings σ A {displaystyle sigma _{A}} and σ B {displaystyle sigma _{B}} need to be reconciled. Typically, it is assumed that these strings differ by up to a fixed number of edits (i.e. character insertions, deletions, or modifications). Then data synchronization is the process of reducing edit distance between σ A {displaystyle sigma _{A}} and σ B {displaystyle sigma _{B}} , up to the ideal distance of zero. This is applied in all filesystem based synchronizations (where the data is ordered). Many practical applications of this are discussed or referenced above. It is sometimes possible to transform the problem to one of unordered data through a process known as shingling (splitting the strings into shingles). In fault-tolerant systems, distributed databases must be able to cope with the loss or corruption of (part of) their data. The first step is usually replication, which involves making multiple copies of the data and keeping them all up to date as changes are made. However, it is then necessary to decide which copy to rely on when loss or corruption of an instance occurs.

Parent Topic

Child Topic

No Parent Topic