With high bandwidth and low latency, the InfiniBand protocol has been widely used in distributed databases and high-performance computing in recent years. Compared to traditional PCI bus, 10 Gigabit Ethernet and Myrinet technology, InfiniBand in addition to the delay and bandwidth on the dominant and has a better quality of service. This paper first introduces the basic situation and principle of InfiniBand protocol and its application status in high performance computing field. Based on InfiniBand protocol, this paper proposes a high-performance message passing method based on remote direct memory access, such as delay and peak bandwidth Indicators have very good characteristics. The experimental results show that compared with the traditional technology, the improved method proposed in this paper reduces the delay of more than 20% and the bandwidth is more than doubled, and improves the system performance by reducing the transmission time of the control message.
Redundancy is a traditional technique for improving the reliability of distributed systems. In many scenes, system redundancy is not available or not feasible. How to assign the tasks of parallel applications to the distributed system's processors and maximize the reliability of the system becomes one of the important issues that need to be studied. This paper proposes a dynamic non-redundant data allocation method for distributed database system, which specifies the fragment update parameters and dynamic cost parameters, to find the best solution for reallocating redundant data. Specifically, we use the parameters to iterative estimate the reassignment of the fragment to the cost of the node, according to the choice of the lowest cost of the node for data migration. The result shows that this kind of scheme can make full use of the advantages of non-redundant data allocation and fasten the speech of communication, which can further improve the consistency and reliability of the distributed database system.
In the current distributed database system architecture enterprise-class, the massively parallel processing architecture is used frequently. This method can be used to carry out large-scale analysis of data through distributed across multiple nodes and storage and query process, from its scope of application produce simple reports to perform complex analytics workloads. However, due to the characteristics of shared-nothing MPP technology, to carry out large-scale data analysis query and maintain data consistency there are some difficulties. In this paper, a relational SQL-based query parsing distributed MPP data distribution and parallel processing technology, the goal is to maintain and improve the consistency of distributed data query speed. First SQL query analysis section, according to the syntax analysis, semantic analysis and sentence parsing steps such order; in the form of work distribution node/data node in the data distribution phase, all tasks emanating from the work of a distribution node, all need to treated results are returned to the node; when parallel processing, each node needs to store a copy of the lookup table, and on each node concurrent execution of SQL statements for each query. Experimental results show that the proposed MPP data distribution and parallel processing scheme can support large volume of data processing, ensuring data consistency in the premise of improving query processing speed.
Traditional distributed transaction protocol typically included two-phase locking (2PL) or optimistic concurrency control (OCC). However, both of them have the problems when more concurrency conflicts existed in the conditions for large amount requirements of data and processing. In this paper, we propose one merged concurrency control method in the distributed single transaction to support for conflicts more efficiently. TPC-C experiments and results implied that our merged distributed transaction has better ability to handle concurrency conflicts, especially in the case of many conflicts. It can read, write and recall operations with fully utilizing temporal redundancy and data network and be suitable for large-scale data and computationally intensive projects.
This paper proposes a distributed data clustering technique based on deep neural network.First, each record in the distributed database is taken as an input vector, and its characteristics are extracted and input to the input layer of the depth neural network.The weight of the connection is trained by BP algorithm, and the training of depth neural network output is realized by adjusting the weight.Finally, the data clustering results are judged according to the similarity of the current vector corresponding to the output data.Experimental results based on small-scale distributed systems show that this method has better test set accuracy than traditional k-means clustering method, and is more suitable for large-scale data clustering in the distributed environments.
the existing distributed database management system can realize the data storage with high access bandwidth through the cluster, which has the characteristics of reliable data replication and fault detection and fast automatic system recovery.However, with the existing network platform software and computing model, there is still a need to establish a virtual organization that can implement specific requirements based on the data manipulation mechanism and data access patterns in the network.This paper proposes a large-scale data storage and management scheme based on distributed database, which is more suitable for the use and control of large-scale data access and other types of application and data access modes as needed, compared with other technical solutions.Mode can cover multiple orders of magnitude of data exchange and data processing and input / output systems.The experimental results show that this scheme has the advantages of good reliability, easy operation and support of large amount of data processing.
Distributed system have shown its good robustness, extensibility and effectiveness in the processing, storage and transmission of large data. Now with the expansion of the amount of data and application, it has become a challenging task to ensure its transaction in a distributed and heterogeneous environment. The sub transactions of a distributed transaction not only need to be coordinated with the local other transactions, but also with other sub transactions that generated in the global manager. Based on the discussion of distributed transaction processing model and its transaction commit protocol, the failure reasons with the model analysis of general distributed transaction processing in the practical application in our information database system are given, based on the interface in a relational database management system and super text pre-treatment language of distributed transaction processing implementation method. The research results show that this distributed transaction processing method is reliable, and can simplify the implementation of global program.
Distributed storage systems demonstrate superior performance in terms of capacity, I/O throughput and scalability, but the reliability of inexpensive components is not high, and it is easy to cause catastrophic consequences of data loss. Fault detection is a key technology to provide high availability for distributed system, the new adaptive fault detection system architecture is proposed in this paper, which considered availability and disaster recovery ability, is divided into four levels: 1) Client layer: in this layer, customers can keep in touch with local network or outside network disaster recovery/backup center; 2) Server layer: in this layer, the high availability is guaranteed by its backup scheme; 3) Local recovery layer: in this layer, data backup and recovery are realized through local real-time replication; 4)Remote recovery layer: in this layer, data backup and recovery are realized (delayed) through remote backup database. And we evaluated and tested this method in our lab network environment.