Palette: enabling scalable analytics for big-memory, multicore machines

Fei Chen,Tere Gonzalez,Jun Li,Manish Marwah,Jim Pruyne,Krishnamurthy Viswanathan,Mijung Kim

Palette: enabling scalable analytics for big-memory, multicore machines

2014

Hadoop and its variants have been widely used for processing large scale analytics tasks in a cluster environment. However, use of a commodity cluster for analytics tasks needs to be reconsidered based on two key observations: (1) in recent years, large memory, multicore machines have become more affordable; and (2) recent studies show that most analytics tasks in practice are smaller than 100 GB. Thus, replacing a commodity cluster with a large memory, multicore machine can enable in-memory analytics at an affordable cost. However= programming on a big-memory, multicore machine is a challenge. Multi-threaded programming is notoriously difficult. Further, the memory design of most large memory servers follows non-uniform memory access (NUMA) architecture. While NUMA-aware programming often leads to high efficiency in analytics tasks, it is usually done in an ad hoc manner. In this demo, we present Palette, an analytics framework that exploits large memory to trade space for time while also addressing the challenges of multi-threaded, NUMA-aware programming. Palette manages multiple, index-like data representations for input datasets. An operator may have multiple implementations, each of which uses a different data representation. Palette uses a cost-based approach to automatically select the fastest one on a given dataset. Palette addresses challenges of multi-threaded and NUMA-aware programming by adapting Hadoop for a single multicore machine and modifying it by considering the characteristics of modern NUMA hardware. Users can write programs using exactly the same APIs as those used in traditional Hadoop, while transparently benefiting from multi-threaded and NUMA-aware infrastructure. We have developed a research prototype of Palette. Specifically, at SIGMOD we will demonstrate how to (1) create an operator, such as time series similarity search, on Palette, (2) execute the operator with Palette's automatic implementation selection feature, and (3) monitor and compare different operator implementations.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations