People use an increasing number of personal electronic devices like notebook computers, MP3 players and smart phones in their daily lives. Making sure that data on these devices is available where needed and backed up regularly is a time-consuming and error-prone burden on users. In this paper, we describe and evaluate PodBase, a system that automates storage management on personal devices. The system takes advantage of unused storage and incidental connectivity to propagate the system state and replicate files. PodBase ensures the durability of data despite device loss or failure; at the same time, it aims to make data available on devices where it is useful.
PodBase seeks to exploit available storage and pairwise device connections with little or no user attention. Towards this goal, it relies on a declarative specification of its replication goals and uses linear optimization to compute a replication plan that considers the current distribution of files, availability of storage, and history of device connections. Results from a user study in ten real households show that, under a wide range of conditions, PodBase transparently manages the durability and availability of data on personal devices.
Research on decentralized systems such as peer-to-peer overlays and ad hoc networks has been hampered by the fact that few systems of this type are in production use, and the space of possible applications is still poorly understood. As a consequence, new ideas have mostly been evaluated using common synthetic workloads, traces from a few existing systems, testbeds like PlanetLab, and simulators like ns-2. Some of these methods have, in fact, become the “gold standard” for evaluating new systems, and are often a prerequisite for getting papers accepted at top conferences in the
Recently, there has been significant research interest in leveraging social networks to defend against Sybil attacks. While much of this work may appear similar at first glance, existing social network-based Sybil defense schemes can be divided into two categories: Sybil detection and Sybil tolerance. These two categories of systems both leverage global properties of the underlying social graph, but they rely on different assumptions and provide different guarantees: Sybil detection schemes are application-independent and rely only on the graph structure to identify Sybil identities, while Sybil tolerance schemes rely on application-specific information and leverage the graph structure and transaction history to bound the leverage an attacker can gain from using multiple identities. In this paper, we take a closer look at the design goals, models, assumptions, guarantees, and limitations of both categories of social network-based Sybil defense systems.
Online social networking sites (OSNs) like Facebook and Orkut contain personal data of millions of users. Many OSNs view this data as a valuable asset that is at the core of their business model. Both OSN users and OSNs have strong incentives to restrict large scale crawls of this data. OSN users want to protect their privacy and OSNs their business interest. Traditional defenses against crawlers involve rate- limiting browsing activity per user account. These defense schemes, however, are vulnerable to Sybil attacks, where a crawler creates a large number of fake user accounts. In this paper, we propose Genie, a system that can be deployed by OSN operators to defend against Sybil crawlers. Genie is based on a simple yet powerful insight: the social network itself can be leveraged to defend against Sybil crawlers. We first present Genie's design and then discuss how Genie can limit crawlers while allowing browsing of user profiles by normal users.
Thwarting large-scale crawls of user profiles in online social networks (OSNs) like Facebook and Renren is in the interest of both the users and the operators of these sites. OSN users wish to maintain control over their personal information, and OSN operators wish to protect their business assets and reputation. Existing rate-limiting techniques are ineffective against crawlers with many accounts, be they fake accounts (also known as Sybils) or compromised accounts of real users obtained on the black market.
Peer-to-peer (p2p) technology can potentially be used to build highly reliable applications without a single point of failure. However, most of the existing applications, such as file sharing or web caching, have only moderate reliability demands. Without a challenging proving ground, it remains unclear whether the full potential of p2p systems can be realized.To provide such a proving ground, we have designed, deployed and operated a p2p-based email system. We chose email because users depend on it for their daily work and therefore place high demands on the availability and reliability of the service, as well as the durability, integrity, authenticity and privacy of their email. Our system, ePOST, has been actively used by a small group of participants for over two years.In this paper, we report the problems and pitfalls we encountered in this process. We were able to address some of them by applying known principles of system design, while others turned out to be novel and fundamental, requiring us to devise new solutions. Our findings can be used to guide the design of future reliable p2p systems and provide interesting new directions for future research.
To make storage management transparent to users, enterprises rely on expensive storage infrastructure, such as high end storage appliances, tape robots, and offsite storage facilities, maintained by full-time professional system administrators. From the user's perspective access to data is seamless regardless of location, backup requires no periodic, manual action by the user, and help is available to recover from storage problems. The equipment and administrators protect users from the loss of data due to failures, such as device crashes, user errors, or virii, as well as being inconvenienced by the unavailability of critical files.
Home users and small businesses must manage increasing amounts of important data distributed among an increasing number of storage devices. At the same time, expert system administration and specialized backup hardware are rarely available in these environments, due to their high cost. Users must make do with error-prone, manual, and time-consuming ad hoc solutions, such as periodically copying data to an external hard drive. Non-technical users are likely to make mistakes, which could result in the loss of a critical piece of data, such as a tax return, customer database, or an irreplaceable digital photograph.
In this thesis, we show how to provide transparent storage management for home and small business users We introduce two new systems: The first, PodBase, transparently ensures availability and durability for mobile, personal devices that are mostly disconnected. The second, SLStore, provides enterprise-level data safety (e.g. protection from user error, software faults, or virus infection) without requiring expert administration or expensive hardware. Experimental results show that both systems are feasible, perform well, require minimal user attention, and do not depend on expert administration during disaster-free operation.
PodBase relieves home users of many of the burdens of managing data on their personal devices. In the home environment, users typically have a large number of personal devices, many of them mobile devices, each of which contain storage, and which connect to each other intermittently. Each of these devices contain data that must be made durable, and available on other storage devices. Ensuring durability and availability is difficult and tiresome for non-expert users, as they must keep track of what data is stored on which devices. PodBase transparently ensures the durability of data despite the loss or failure of a subset of devices; at the same time, PodBase aims to make data available on all the devices appropriate for a given data type. PodBase takes advantage of storage resources and network bandwidth between devices that typically goes unused. The system uses an adaptive replication algorithm, which makes replication transparent to the user, even when complex replication strategies are necessary. Results from a prototype deployment in a small community of users show that PodBase can ensure the durability and availability of data stored on personal devices under a wide range of conditions with minimal user attention.
Our second system, SLStore, brings enterprise-level data protection to home office and small business computing. It ensures that data can be recovered despite incidents like accidental data deletion, data corruption resulting from software errors or security breaches, or even catastrophic storage failure. However, unlike enterprise solutions, SLStore does riot require professional system administrators, expensive backup hard- ware, or routine, manual actions on the part of the user. The system relies on storage leases, which ensure that data cannot be overwritten for a pre-determined period, and an adaptive storage management layer which automatically adapts the level of backup to the storage available. We show that this system is both practical, reliable and easy to manage, even in the presence of hardware and software faults.
Online social networking sites (OSNs) like Facebook and Orkut contain personal data of millions of users. Many OSNs view this data as a valuable asset that is at the core of their business model. Both OSN users and OSNs have strong incentives to restrict large scale crawls of this data. OSN users want to protect their privacy and OSNs their business interest. Traditional defenses against crawlers involve rate- limiting browsing activity per user account. These defense schemes, however, are vulnerable to Sybil attacks, where a crawler creates a large number of fake user accounts. In this paper, we propose Genie, a system that can be deployed by OSN operators to defend against Sybil crawlers. Genie is based on a simple yet powerful insight: the social network itself can be leveraged to defend against Sybil crawlers. We first present Genie's design and then discuss how Genie can limit crawlers while allowing browsing of user profiles by normal users.
Recently, there has been much excitement in the research community over using social networks to mitigate multiple identity, or Sybil, attacks. A number of schemes have been proposed, but they differ greatly in the algorithms they use and in the networks upon which they are evaluated. As a result, the research community lacks a clear understanding of how these schemes compare against each other, how well they would work on real-world social networks with different structural properties, or whether there exist other (potentially better) ways of Sybil defense. In this paper, we show that, despite their considerable differences, existing Sybil defense schemes work by detecting local communities (i.e., clusters of nodes more tightly knit than the rest of the graph) around a trusted node. Our finding has important implications for both existing and future designs of Sybil defense schemes. First, we show that there is an opportunity to leverage the substantial amount of prior work on general community detection algorithms in order to defend against Sybils. Second, our analysis reveals the fundamental limits of current social network-based Sybil defenses: We demonstrate that networks with well-defined community structure are inherently more vulnerable to Sybil attacks, and that, in such networks, Sybils can carefully target their links in order make their attacks more effective.