Brainwash: A Data System for Feature Engineering.

Michael R. Anderson,Dolan Antenucci,Victor Bittorf,Matthew Burgess,Michael J. Cafarella,Arun Kumar,Feng Niu,Yongjoo Park,Christopher Ré,Ce Zhang

Brainwash: A Data System for Feature Engineering.

2013

A new generation of data processing systems, including web search, Google’s Knowledge Graph, IBM’s Watson, and several different recommendation systems, combine rich databases with software driven by machine learning. The spectacular successes of these trained systems have been among the most notable in all of computing and have generated excitement in health care, finance, energy, and general business. But building them can be challenging, even for computer scientists with PhD-level training. If these systems are to have a truly broad impact, building them must become easier. We explore one crucial pain point in the construction of trained systems: feature engineering. Given the sheer size of modern datasets, feature developers must (1) write code with few effective clues about how their code will interact with the data and (2) repeatedly endure long system waits even though their code typically changes little from run to run. We propose brainwash, a vision for a feature engineering data system that could dramatically ease the ExploreExtract-Evaluate interaction loop that characterizes many trained system projects.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations