logo
    SeeMoRe: A Fault-Tolerant Protocol for Hybrid Cloud Environments.
    0
    Citation
    59
    Reference
    20
    Related Paper
    Abstract:
    Large scale data management systems utilize State Machine Replication to provide fault tolerance and to enhance performance. Fault-tolerant protocols are extensively used in the distributed database infrastructure of large enterprises such as Google, Amazon, and Facebook, as well as permissioned blockchain systems like IBM's Hyperledger Fabric. However, and in spite of years of intensive research, existing fault-tolerant protocols do not adequately address all the characteristics of distributed system applications. In particular, hybrid cloud environments consisting of private and public clouds are widely used by enterprises. However, fault-tolerant protocols have not been adapted for such environments. In this paper, we introduce SeeMoRe, a hybrid State Machine Replication protocol to handle both crash and malicious failures in a public/private cloud environment. SeeMoRe considers a private cloud consisting of nonmalicious nodes (either correct or crash) and a public cloud with both Byzantine faulty and correct nodes. SeeMoRe has three different modes which can be used depending on the private cloud load and the communication latency between the public and the private cloud. We also introduce a dynamic mode switching technique to transition from one mode to another. Furthermore, we evaluate SeeMoRe using a series of benchmarks. The experiments reveal that SeeMoRe's performance is close to the state of the art crash fault-tolerant protocols while tolerating malicious failures.
    Keywords:
    IBM
    Replication
    Much of our critical infrastructure is controlled by large software systems whose participants are distributed across the Internet. As our dependence on these critical systems continues to grow, it becomes increasingly important that they meet strict availability and performance requirements, even in the face of malicious attacks, including those that are successful in compromising parts of the system. This dissertation presents the first replication protocols capable of guaranteeing correctness, availability, and good performance even when some of the servers are compromised, enabling the construction of highly available and highly resilient systems for our critical infrastructure. Prior to this work, intrusion-tolerant replication protocols were designed to perform well in fault-free executions, and this is how they were evaluated. In this dissertation we point out that many state-of-the-art protocols are vulnerable to significant performance degradation by a small number of malicious processors. We define a new performance-oriented correctness criterion, BOUNDED-DELAY, against which intrusion-tolerant replication protocols can be evaluated. Protocols that meet BOUNDED-DELAY are required to provide a consistent level of performance, even when the system is under attack by an adversary that controls some of the processors. We present Prime, an intrusion-tolerant replication protocol that meets BOUNDED-DELAY and thus offers a stronger performance guarantee under attack than previous state-of-the-art protocols. An evaluation of a prototype implementation shows that Prime performs competitively with existing protocols in fault-free executions and achieves an order of magnitude performance improvement in under-attack executions in 4-server and 7-server configurations. Using Prime as a building block, we show how to design and implement an attack-resilient, large-scale intrusion-tolerant replication system for wide-area networks. The system is hierarchical and is suited to deployments consisting of several wide-area sites, each with a cluster of replication servers. We present three mechanisms for attack-resilient and efficient inter-site communication, which enable the system to perform well in bandwidth-constrained wide-area networks without making it susceptible to performance degradation caused by malicious servers. Our results provide evidence that it is possible to construct highly resilient, large-scale survivable systems that perform well even when some of the servers (and some entire sites) are compromised.
    Intrusion tolerance
    Replication
    Citations (2)
    Recent work on intrusion-tolerance has shown that resilience to sophisticated network attacks requires system replicas to be deployed across at least three geographically distributed sites. While commodity data centers offer an attractive solution for hosting these sites due to low cost and management overhead, their use raises significant confidentiality concerns: system operators may not want private data or proprietary algorithms exposed to servers outside their direct control. We present a new model for Byzantine Fault Tolerant replicated systems that moves toward "intrusion tolerance as a service". Under this model, application logic and data are only exposed to servers hosted on the system operator's premises. Additional offsite servers hosted in data centers can support the needed resilience without executing application logic or accessing unencrypted state. We have implemented this approach in the open-source Spire system, and our evaluation shows that the performance overhead of providing confidentiality can be less than 4% in terms of latency.
    Intrusion tolerance
    Resilience
    The dependability issue including fault tolerance and security is a basic stumbling block to the practical and commercial application of the mobile code technology. This short paper introduces the SeCode approach to fault-tolerant and secure execution of mobile code. The research focus is on the development of a method and an architectural framework to support mobile code against unintentional/intentional faults and malicious attacks from its operating environment. The proposed approach makes no assumption about the operating environment (i.e. remote hosts) for mobile code. It integrates work on fault tolerance and security within a well-defined formal system model, and offers a powerful ability to detect and identify faulty hosts and malicious attacks by means of redundant data structures with advanced fault diagnosis and cryptography techniques.
    Code (set theory)
    Software fault tolerance
    Citations (0)
    Given the complexity of infrastructures, current state of security technology and the limited budgets any security defense systems can be outnumbered by a sufficient number of random sequential failures, e.g. due to multiple DOS attacks. Complementary to the regular solutions where per node several identical dedicated nodes are added (i.e. redundants), a resource sharing approach between undedicated nodes is aimed to build a large scale cluster of redundants and approximate perpetual availability of security distributing nodes. In this work principles are acquired from related and unrelated fields to build a distributed defense system (DDS) that relies on resource sharing. The proposed protocol set, called Medusa, achieves this DDS by dissociating trust authority from identity and hardware, making trust a moveable emancipated commodity. As a moveable object trust can apply traditional fault tolerance techniques by process migration.
    Shared resource
    Mobile agents offer a new paradigm in the field of distributed computing.Mobile agent is a program which can migrate from one machine to another machine in order to fulfill the client's needs.Since it moves from machine to machine on the demand of client, there is threat to the data it carries, from the malicious node or hacker who can steal or change the confidential data of the client.In this paper, we have proposed an integrated framework which is fault tolerant as well as secure from such malicious nodes or hackers.We have applied encryption algorithm for data security and fault tolerant mechanism to avoid any kind of fault using clone and check-pointing with location tracking mechanism.We have implemented the proposed approach and evaluated the time taken by the agent on the basis of various parameters with its fault tolerant and security feature.
    Mobile agent
    This paper describes a generic architecture for intrusion tolerant Internet servers. It aims to build systems that are able to survive attacks in the context of an open network such as the Internet. To do so, the design is based on fault tolerance techniques, in particular redundancy and diversification. These techniques give a system the additional resources to continue delivering the correct service to its legitimate clients even when active attacks are corrupting parts of the system components.
    Intrusion tolerance
    Citations (8)
    Everyone agree that data is more secured locally than when it is outsourced far away from their owners. But the growth of local data annually implies extra charges for the customers, which makes their business slowing down. Cloud computing paradigm comes with new technologies that offer a very economic and cost-effective solutions, but at the expense of security. So designing a lightweight system that can achieve a balance between cost and data security is really important. Several schemes and techniques has been proposed for securing, checking and repairing data, but unfortunately the majority doesn't respect and preserve the cost efficiency and profitability of cloud offers. In this paper we try to answer the question: how can we design a model that enables a high level of integrity check while preserving a minimum cost? We try also to analyze a new threat model regards the tracking of a file's fragments during a repair or a download operation, which can cause the total loss of customers data. The solution given in this paper is based on redistributing fragments locations after every data operation using a set of random values generated by a chaotic map. Finally, we provide a data loss insurance (data corruption as well) approach based on user estimation of data importance level that helps in reducing user concerns about data loss.
    Data integrity
    Data Security
    Download
    The mobile agent is a computer program that can move between different hosts in heterogeneous networks. This paradigm is advantageous for distributed systems implementation, especially in mobile computing application characterized by low bandwidth, high latency and unreliable networks connections. Mobile agent is also attractive for distributed transactions applications. Although mobile agent has been studied for twenty years for some good reasons, it is not largely used in developing distributed systems for simple reasons: important issues like security and fault tolerance are not solved in effective way. In this paper we address the issue of fault tolerance in mobile agent systems and transactional support. We present the agent system design and describe the protocol of our approach in which we treat infrastructure failures to prevent a partial or complete loss of mobile agent and deal with semantic failures to ensure atomic execution and transactional support for mobile agent.
    Mobile agent
    Replication
    Citations (7)
    The aims of the research are to investigate techniques that support the development of highly dependable applications in a distributed system environment. Techniques we are investigating include task allocation and fault-tolerant protocols supporting redundant task allocation, load balance, fault-tolerant computing and communication, error detecting and reconfiguration, test case generation and fault injection. The highly dependable environment co-exists with the original communication and operating system. It is transparent to applications that do not need the highly dependable environment. Applications that wish to use the highly dependable environment need only to specify the level of criticality of their tasks in order for the system to assign the level of redundancy and to activate the relevant fault tolerant protocols. The application we intend to implement in the environment is the firewall application. The firewall is run in redundant mode. Each incoming or outgoing packet is checked by two or more copies of the firewall application. Only when the majority of the firewall copies decide to accept the packet, the packet can go through the firewall. Otherwise, the packet will be rejected: Different decisions from the different firewall copies signify a possible hardware fault or a software error in the underlying system.
    Firewall (physics)
    Control reconfiguration
    Software fault tolerance
    Large scale data management systems utilize State Machine Replication to provide fault tolerance and to enhance performance. Fault-tolerant protocols are extensively used in the distributed database infrastructure of large enterprises such as Google, Amazon, and Facebook. However, and in spite of years of intensive research, existing fault-tolerant protocols do not adequately address hybrid cloud environments consisting of private and public clouds which are widely used by enterprises. In this paper, we consider a private cloud consisting of nonmalicious nodes (crash-only failures) and a public cloud with possible malicious failures. We introduce SeeMoRe, a hybrid State Machine Replication protocol that uses the knowledge of where crash and malicious failures may occur in a public/private cloud environment to improve overall performance. SeeMoRe has three different modes that can be used depending on the private cloud load and the communication latency between the public and private clouds. SeeMoRe can dynamically transition from one mode to another. Furthermore, an extensive evaluation reveals that SeeMoRe's performance is close to the state of the art crash fault-tolerant protocols while tolerating malicious failures.