SmartFetch: Efficient Support for Selective Queries

Manuel Ferreira,João Paiva,Manuel Bravo,Luís E. T. Rodrigues

SmartFetch: Efficient Support for Selective Queries

2015

Manuel Ferreira
João Paiva
Manuel Bravo
Luís E. T. Rodrigues

The paper proposes SmartFetch, a storage strategy that relies on a combination of techniques aimed at efficiently supporting selective jobs that are only concerned with a subset of the entire dataset in systems such as Hadoop and Spark. We combine the use of an appropriate data-layout with data indexing tools to improve the data access speed and significantly shorten total job execution time. An extensive experimental evaluation of SmartFetch shows that, by avoiding reading irrelevant blocks, it can provide significant speedups when compared to the basic Hadoop and Spark implementations. Further, our system also outperforms other implementations that use several variants of the techniques we have embedded in SmartFetch.

Keywords:

Big data
Computer science
Database
Data access
Distributed computing
Search engine indexing
Spark (mathematics)
Implementation
execution time

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations