Supporting atomic blocks (e.g., Transactional Memory (TM)) can have far-reaching effects on language design and implementation. While much is known about the language-level semantics of TM and the performance of algorithms for implementing TM, little is known about how platform characteristics affect the manner in which a compiler should instrument code to achieve efficient transactional behavior. We explore the interaction between compiler instrumentation and the performance of transactions. Through evaluation on ARM/Android, SPARC/Solaris, IA32/Linux and IA32/MacOS, we show that the compiler must consider the platform when determining which analyses, transformations, and optimizations to perform. Implementation issues include how TM library code is reached, how per-thread TM metadata is stored and accessed, and how a library switches between modes of operation. We also show that different platforms favor different TM algorithms, through the introduction of a new TM algorithm for the ARM processor. Our findings will affect compiler and TM library designers: to achieve peak performance for transactions, the compiler must perform platform-dependent analysis, transformation, and optimization, and the interface to the TM library must differ according to platform.
The notion of “atomicity” implies that it is safe to rearrange memory accesses within a transaction. In this paper, we sketch a mechanism for postponing contentious transactional operations until commit time, where they become impervious to aborts. We then contemplate the interplay between such a mechanism and languagelevel semantics. Though preliminary, our algorithms and recommendations should prove useful to designers of transactional compilers and languages.
Time-based transactional memories typically rely on a shared memory counter to ensure consistency. Unfortunately, such a counter can become a bottleneck. In this article, we identify properties of hardware cycle counters that allow their use in place of a shared memory counter. We then devise algorithms that exploit the x86 cycle counter to enable bottleneck-free transactional memory runtime systems. We also consider the impact of privatization safety and hardware ordering constraints on the correctness, performance, and generality of our algorithms.
The addition of transactional memory (TM) support to existing languages provides the opportunity to create new soft- ware from scratch using transactions, and also to simplify or extend legacy code by replacing existing synchronization with language-level transactions. In this paper, we describe our experiences transactionalizing the memcached application through the use of the GCC implementation of the Draft C++ TM Specification. We present experiences and recommendations that we hope will guide the effort to integrate TM into languages, and that may also contribute to the growing collective knowledge about how programmers can begin to exploit TM in existing production-quality software.
This work proposes VPsec, a novel hardware-only scheme that leverages value prediction in an embodiment and system design to mitigate fault attacks in general purpose microprocessors. The design of VPsec augments value prediction schemes in modern microprocessors with fault detection logic and reaction logic, to mitigate fault attacks to both the datapath and the value predictor itself. VPsec requires minimal hardware changes (negligible area impact) with respect to a baseline processor supporting value prediction, it has no software overheads {no increase in memory footprint), and, under common attack scenarios, it retains most of the performance benefits of value prediction. Our evaluation of VPsec demonstrates its efficacy in countering fault attacks and retaining performance in modern microprocessors.
The addition of transactional memory (TM) support to existing languages provides the opportunity to create new soft- ware from scratch using transactions, and also to simplify or extend legacy code by replacing existing synchronization with language-level transactions. In this paper, we describe our experiences transactionalizing the memcached application through the use of the GCC implementation of the Draft C++ TM Specification. We present experiences and recommendations that we hope will guide the effort to integrate TM into languages, and that may also contribute to the growing collective knowledge about how programmers can begin to exploit TM in existing production-quality software.
In this paper, we present the current state of a variety of software tools that we are making available to the broad research community. Our intent is to ensure that researchers in Transactional Memory (TM) and related fields have a common baseline that is both easy to use and appropriate for implementing new algorithms and testing hypotheses. The most significant contribution is a transactionalized C++ Standard Template Library. We also provide a proper and extensible lazy software TM implementation, a common build environment, a repackaging of several benchmarks, and a transactional thread-level speculation infrastructure. In total, we believe this creates a suitable baseline for researchers in this “third decade” of Transactional Memory.
The arrival of best-effort hardware transactional memory (TM) creates a challenge for designers of transactional memory runtime libraries. On the one hand, using hardware TM can dramatically reduce the latency of transactions. On the other, it is critical to create a fall-back path to handle the cases where hardware TM cannot complete a transaction, and this path ought to be scalable and reasonably fair to all transactions. Additionally, while the hardwareaccelerated system is likely to have weaker safety guarantees than a pure hardware TM, it ought not to be weaker than what software TM guarantees. We propose a new hybrid TM algorithm based on the “Cohorts” software TM algorithm. Our algorithm guarantees opacity by preventing any transaction from observing the un-committed state of any other transaction. It does so via a novel state machine that maximizes the use of hardware TM, while affording opportunity to enforce fairness policies. We present an implementation of our Hybrid Cohorts that prioritizes transactions that fall back to software mode. In this manner, we ensure that long-running transactions do not starve, while still allowing concurrency among hardware and software transactions.
Language-level transactions are said to provide “atomicity,” implying that the order of operations within a transaction should be invisible to concurrent transactions and thus that independent operations within a transaction should be safe to execute in any order. In this article, we present a mechanism for dynamically reordering memory operations within a transaction so that read-modify-write operations on highly contended locations can be delayed until the very end of the transaction. When integrated with traditional transactional conflict detection mechanisms, our approach reduces aborts on hot memory locations, such as statistics counters, thereby improving throughput and reducing wasted work. We present three algorithms for delaying highly contended read-modify-write operations within transactions, and we evaluate their impact on throughput for eager and lazy transactional systems across multiple workloads. We also discuss complications that arise from the interaction between our mechanism and the need for strong language-level semantics, and we propose algorithmic extensions that prevent errors from occurring when accesses are aggressively reordered in a transactional memory implementation with weak semantics.