Feedback directed optimization of TCMalloc

2014 
TCMalloc [9] is an open-source memory allocator. Its use of thread-local caches of free objects enables most allocations/deallocations to be satisfied from thread-local heaps not requiring locks, making it a highly scalable memory allocator for multi-threaded applications. TCMalloc code contains several parameters that control the thread-local caches. The values of these parameters have been carefully chosen to provide good performance for the common case. However, as we will show, the optimal values of these parameters depend upon application-specific memory allocation behavior, so there is no one configuration that attains the optimal performance in all applications. In light of this, this paper presents a feedback-directed optimization of TCMalloc. The proposed optimization method targets the batch sizes, which determine the aggressiveness and timing of thread cache management mechanisms that move free objects between central and thread-local caches. It aims to tailor the batch sizes to application behavior, in order to make prefetching from the central cache aggressive enough to reduce unnecessary synchronization, without causing other performance problems due to excessive garbage collection of free objects in the thread caches. To this end, the optimization method observes a target application during a profile run and uses an iterative algorithm to compute batch sizes. Empirical results show that the proposed optimization results in up to 10% performance improvement over the default configuration on Google internal benchmark applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    19
    Citations
    NaN
    KQI
    []