SciBORQ: Scientific data management with Bounds On Runtime and Quality

2011 
Data warehouses underlying virtual observatories stress the capabilities of database management systems in many ways. They are filled, on a daily basis, with large amounts of factual information derived from intensive data scrubbing and computational feature extraction pipelines. The predominant data processing techniques focus on parallel loads and map-reduce feature extraction algorithms. Querying these huge databases require a sizable computing cluster, while ideally the initial investigation should run interactively, using as few resources as possible. In this paper, we explore a different route, one based on the observation that at any given time only a fraction of the data is of primary value for a specific task. This fraction becomes the focus of scientific reflection through an iterative process of ad-hoc query refinement. Steering through data to facilitate scientific discovery demands guarantees for the query execution time. In addition, strict bounds on errors are required to satisfy the demands of scientific use, such that query results can be used to test hypotheses reliably. We propose SciBORQ, a framework for scientific data exploration that gives precise control over runtime and quality of query answering. We present novel techniques to derive multiple interesting data samples, called impressions. An impression is selected such that the statistical error of a query answer remains low, while the result can be computed within strict time bounds. Impressions differ from previous sampling approaches in their bias towards the focal point of the scientific data exploration, their multi-layer design, and their adaptiveness to shifting query workloads. The ultimate goal is a complete system for scientific data exploration and discovery, capable of producing quality answers with strict error bounds in pre-defined time frames.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    72
    Citations
    NaN
    KQI
    []