Coping with software defects that occur in the post-deployment stage is a challenging problem: bugs may occur only when the system uses a specific configuration and only under certain usage scenarios. Nevertheless, halting production systems until the bug is tracked and fixed is often impossible. Thus, developers have to try to reproduce the bug in laboratory conditions. Often the reproduction of the bug consists of the lion share of the debugging effort.
Current approaches to compiling aspect-oriented programs are inefficient. This inefficiency has negative effects on the productivity of the development process and is especially prohibitive for dynamic aspect deployment. In this work, we present how well-known virtual machine techniques can be used with only slight modifications to support fast aspect deployment while retaining runtime performance. Our implementation accelerates dynamic aspect deployment by several orders of magnitude relative to mainstream aspect-oriented environments. We also provide a detailed comparison of alternative implementations of execution environments with support for dynamic aspect deployment.
Current approaches to compiling aspect-oriented programs are inefficient. This inefficiency has negative effects on the productivity of the development process and is especially prohibitive for dynamic aspect deployment. In this work, we present how well-known virtual machine techniques can be used with only slight modifications to support fast aspect deployment while retaining runtime performance. Our implementation accelerates dynamic aspect deployment by several orders of magnitude relative to mainstream aspect-oriented environments. We also provide a detailed comparison of alternative implementations of execution environments with support for dynamic aspect deployment.
Due to the high dynamic frequency of virtual method calls in typical object-oriented programs, feedback-directed devirtualization and inlining is one of the most important optimizations performed by high-performance virtual machines. A critical input to effective feedback-directed inlining is an accurate dynamic call graph. In a virtual machine, the dynamic call graph is computed online during program execution. Therefore, to maximize overall system performance, the profiling mechanism must strike a balance between profile accuracy, the speed at which the profile becomes available to the optimizer, and profiling overhead. This paper introduces a new low-overhead sampling-based technique that rapidly converges on a high-accuracy dynamic call graph. We have implemented the technique in two high-performance virtual machines: Jikes RVM and J9. We empirically assess our profiling technique by reporting on the accuracy of the dynamic call graphs it computes and by demonstrating that increasing the accuracy of the dynamic call graph results in more effective feedback-directed inlining.
Many large-scale Java applications suffer from runtime bloat. They execute large volumes of methods and create many temporary objects, all to execute relatively simple operations. There are large opportunities for performance optimizations in these applications, but most are being missed by existing optimization and tooling technology. While JIT optimizations struggle for a few percent improvement, performance experts analyze deployed applications and regularly find gains of 2× or more. Finding such big gains is difficult, for both humans and compilers, because of the diffuse nature of runtime bloat. Time is spread thinly across calling contexts, making it difficult to judge how to improve performance. Our experience shows that, in order to identify large performance bottlenecks in a program, it is more important to understand its dynamic dataflow than traditional performance metrics, such as running time. This article presents a general framework for designing and implementing scalable analysis algorithms to find causes of bloat in Java programs. At the heart of this framework is a generalized form of runtime dependence graph computed by abstract dynamic slicing , a semantics-aware technique that achieves high scalability by performing dynamic slicing over bounded abstract domains. The framework is instantiated to create two independent dynamic analyses, copy profiling and cost-benefit analysis , that help programmers identify performance bottlenecks by identifying, respectively, high-volume copy activities and data structures that have high construction cost but low benefit for the forward execution. We have successfully applied these analyses to large-scale and long-running Java applications. We show that both analyses are effective at detecting inefficient operations that can be optimized for better performance. We also demonstrate that the general framework is flexible enough to be instantiated for dynamic analyses in a variety of application domains.
This paper describes the implementation of an online feedback-directed optimization system. The system is fully automatic; it requires no prior (offline) profiling run. It uses a previously developed low-overhead instrumentation sampling framework to collect control flow graph edge profiles. This profile information is used to drive several traditional optimizations, as well as a novel algorithm for performing feedback-directed control flow graph node splitting. We empirically evaluate this system and demonstrate improvements in peak performance of up to 17% while keeping overhead low, with no individual execution being degraded by more than 2% because of instrumentation.