Brendon Cahoon

The University of Texas at Austin

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Kathryn S. McKinley

Google (United States)

Amer Diwan

Google (United States)

Ryan A. Metcalf

ARUP Laboratories (United States)

Waseem Q. Anani

ARUP Laboratories (United States)

A Paley Derek

Vanderbilt University

E. Kish‐Trier

Zhihong Lu Kathryn

University of Massachusetts Amherst

Vanessa Redecke

University of Utah

Perry Cheng

Carnegie Mellon University

Rezaul Chowdhury

Stony Brook University

Cooperative Institutions

University of Massachusetts Amherst

University of Utah

The University of Texas at Austin

IBM Research - Thomas J. Watson Research Center

IBM (United States)

University of Colorado Boulder

ARUP Laboratories (United States)

Google (United States)

University of Colorado System

Stanford University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

The Limits of Alias Analysis for Scalar Optimizations

Lecture notes in computer science (2004)

Rezaul Chowdhury Peter Djeu Brendon Cahoon James H. Burrill Kathryn S. McKinley

Alias

10.1007/978-3-540-24723-4_3

Cite

Citations (14)

The hardware/software balancing act for information retrieval on symmetric multiprocessors

Lecture notes in computer science (1998)

Zhihong Lu Kathryn S. McKinley Brendon Cahoon

Surprise

10.1007/bfb0057896

Cite

Citations (6)

Data Flow Analysis for Software Prefetching Linked Data Structures in Java

International Conference on Parallel Architectures and Compilation Techniques (2001)

Brendon Cahoon Kathryn S. McKinley

Abstract: In this paper, we describe an effective compile-time analysis for software prefetching in Java. Previous work in software data prefetching for pointer-based codes uses simple compiler algorithms and does not investigate prefetching for object-oriented language features that make compile-time analysis difficult. We develop a new data flow analysis to detect regular accesses to linked data structures in Java programs. We use intra and interprocedural analysis to identify profitable prefetching opportunities for greedy and jump-pointer prefetching, and we implement these techniques in a compiler for Java. Our results show that both prefetching techniques improve four of our ten programs. The largest performance improvement is 48% with jump-pointers, but consistent improvements are difficult to obtain.

Compile time

Pointer analysis

Just-in-time compilation

10.5555/645988.674177

Cite

Citations (141)

P‐BB‐63 | Preliminary Feasibility of an Artificial Intelligence (AI) Smartphone‐Based ABO‐RhD Blood Typing Kit to Reach New Blood Donors

Transfusion (2023)

E. Kish‐Trier A Paley Derek Ronald C. Wendling Mark E. Astill Brendon Cahoon

10.1111/trf.167_17554

Cite

Citations (0)

Data flow analysis for software prefetching linked data structures in Java

Brendon Cahoon Kathryn S. McKinley

Describes an effective compile-time analysis for software prefetching in Java. Previous work in software data prefetching for pointer-based codes asses simple compiler algorithms and does not investigate prefetching for object-oriented language features that male compile-time analysis difficult. We develop a new data flow analysis to detect regular accesses to linked data structures in Java programs. We use intra- and inter-procedural analysis to identify profitable prefetching opportunities for greedy and jump-pointer prefetching, and we implement these techniques in a compiler for Java. Our results show that both prefetching techniques improve four of our ten programs. The largest performance improvement is 48% with jump-pointers, but consistent improvements are difficult to obtain.

Pointer analysis

Compile time

10.1109/pact.2001.953309

Cite

Citations (32)

Performance evaluation of a distributed architecture for information retrieval

Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02 (1996)

Brendon Cahoon Kathryn S. McKinley

Article Free Access Share on Performance evaluation of a distributed architecture for information retrieval Authors: Brendon Cahoon Department of Computer Science, University of Massachusetts, Amherst, MA Department of Computer Science, University of Massachusetts, Amherst, MAView Profile , Kathryn S. McKinley Department of Computer Science, University of Massachusetts, Amherst, MA Department of Computer Science, University of Massachusetts, Amherst, MAView Profile Authors Info & Claims SIGIR '96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrievalAugust 1996 Pages 110–118https://doi.org/10.1145/243199.243238Online:18 August 1996Publication History 26citation555DownloadsMetricsTotal Citations26Total Downloads555Last 12 Months16Last 6 weeks1 Get Citation AlertsNew Citation Alert added!This alert has been successfully added and will be sent to:You will be notified whenever a record that you have chosen has been cited.To manage your alert preferences, click on the button below.Manage my Alerts New Citation Alert!Please log in to your account Save to BinderSave to BinderCreate a New BinderNameCancelCreateExport CitationPublisher SiteeReaderPDF

10.1145/243199.243238

Cite

Citations (46)

Using hardware performance monitors to understand the behavior of java applications

Peter F. Sweeney Matthias Hauswirth Brendon Cahoon Perry Cheng Amer Diwan

Modern Java programs, such as middleware and application servers, include many complex software components. Improving the performance of these Java applications requires a better understanding of the interactions between the application, virtual machine, operating system, and architecture. Hardware performance monitors, which are available on most modern processors, provide facilities to obtain detailed performance measurements of long-running applications in real time. However, interpreting the data collected using hardware performance monitors is difficult because of the low-level nature of the data. We have developed a system, consisting of two components, to alleviate the difficulty of interpreting results obtained using hardware performance monitors. The first component is an enhanced VM that generates traces of hardware performance monitor values while executing Java programs. This enhanced VM generates a separate trace for each Java thread and CPU combination and thus provides accurate results in a multithreaded and multiprocessor environment. The second component is a tool that allows users to interactively explore the traces using a graphical interface. We implemented our tools in the context of Jikes RVM, an open source Java VM, and evaluated it on a POWER4 multiprocessor. We demonstrate that our system is effective in uncovering as yet unknown performance characteristics and is a first step in exploring the reasons behind observed behavior of a Java program.

Component (thermodynamics)

Embedded Java

Cite

Citations (86)

Simple and effective array prefetching in Java

Brendon Cahoon Kathryn S. McKinley

Java is becoming a viable choice for numerical algorithms due to the software engineering benefits of object-oriented programming. Because these programs still use large arrays that do not fit in the cache, they continue to suffer from poor memory performance. To hide memory latency, we describe a new unified compile-time analysis for software prefetching arrays and linked structures in Java. Our previous work uses data-flow analysis to discover linked data structure accesses, and here we present a more general version that also identifies loop induction variables used in array accesses. Our algorithm schedules prefetches for all array references that contain induction variables. We evaluate our technique using a simulator of an out-of-order superscalar processor running a set of array-based Java programs. Across all our programs, prefetching reduces execution time by a geometric mean of 23%, and the largest improvement is 58%. We also evaluate prefetching on a PowerPC processor, and we show that prefetching reduces execution time by a geometric mean of 17%. Traditional software prefetching algorithms for C and Fortran use locality analysis and sophisticated loop transformations. Because our analysis is much simpler and quicker, it is suitable for including in a just-in-time compiler. We further show that the additional loop transformations and careful scheduling of prefetches used in previous work are not always necessary for modern architectures and Java programs.

PowerPC

Loop tiling

Embedded Java

Speedup

10.1145/583810.583820

Cite

Citations (8)

Effective compile-time analysis for data prefetching in java

Brendon Cahoon Kathryn S. McKinley

The memory hierarchy in modern architectures continues to be a major performance bottleneck. Many existing techniques for improving memory performance focus on Fortran and C programs, but memory latency is also a barrier to achieving high performance in object-oriented languages. Existing software techniques are inadequate for exposing optimization opportunities in object-oriented programs. One key problem is the use of high-level programming abstractions which make analysis difficult. Another challenge is that programmers use a variety of data structures, including arrays and linked structures, so optimizations must work on a broad range of programs. We develop a new unified data-flow analysis for identifying accesses to arrays and linked structures, called recurrence analysis. Prior approaches that identify these access patterns are ad hoc, or treat arrays and linked structures independently. The data-flow analysis is intra- and inter-procedural, which is important in Java programs that use encapsulation to hide implementation details. We show Java programs that traverse arrays and linked structure have poor memory performance. We use compiler-inserted data prefetching to improve memory performance in these types of programs. The goal of prefetching is to hide latency by bringing data into the cache prior to a program's use of the data. We use our recurrence analysis to identify prefetching opportunities in Java programs. We develop a new algorithm for prefetching arrays, and we evaluate several methods for prefetching objects in linked structures. Since garbage collection is an integral part of Java, we evaluate the impact of a copying garbage collector on prefetching. We demonstrate how to improve the memory performance of the collector itself by using prefetching. This dissertation shows that a unified whole-program compiler analysis is effective in discovering prefetching opportunities in Java programs that traverse arrays and linked structures. Compiler-inserted data prefetching improves the memory performance even in the presence of object-oriented features that complicate analysis.

Memory hierarchy

Manual memory management

Data access

Memory model

Source

Cite

Citations (7)

Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

ACM transactions on office information systems (2000)

Brendon Cahoon Kathryn S. McKinley Zhihong Lu

The information explosion across the Internet and elswhere offers access to an increasing number of document collections. In order for users to effectively access these collections, information retrieval (IR) systems must provide coordinated, concurrent, and distributed access. In this article, we explore how to achieve scalable performance in a distributed system for collection sizes ranging from 1GB to 128GB. We implement a fully functional distributed IR system based on a multithreaded version of the Inquery simulation model. We measure performance as a function of system parameters such as client command rate, number of document collections, ter ms per query, query term frequency, number of answers returned, and command mixture. Our results show that it is important to model both query and document commands because the heterogeneity of commands significantly impacts performance. Based on our results, we recommend simple changes to the prototype and evaluate the changes using the simulator. Because of the significant resource demands of information retrieval, it is not difficult to generate workloads that overwhelm system resources regardless of the architecture. However under some realistic workloads, we demonstrate system organizations for which response time gracefully degrades as the workload increases and performance scales with the number of processors. This scalable architecture includes a surprisingly small number of brokers through which a large number of clients and servers communicate.

Response time

10.1145/333135.333136

Cite

Citations (75)