Trident: scalable compute archives: workflows, visualization, and analysis
2016
The Astronomy scientific community has embraced Big Data processing challenges, e.g. associated with time-domain astronomy, and come up with a variety of novel and efficient data processing solutions. However, data processing is only a small part of the Big Data challenge. Efficient knowledge discovery and scientific advancement in the Big Data era requires new and equally efficient tools: modern user interfaces for searching, identifying and viewing data online without direct access to the data; tracking of data provenance; searching, plotting and analyzing metadata; interactive visual analysis, especially of (time-dependent) image data; and the ability to execute pipelines on supercomputing and cloud resources with minimal user overhead or expertise even to novice computing users. The Trident project at Indiana University offers a comprehensive web and cloud-based microservice software suite that enables the straight forward deployment of highly customized Scalable Compute Archive (SCA) systems; including extensive visualization and analysis capabilities, with minimal amount of additional coding. Trident seamlessly scales up or down in terms of data volumes and computational needs, and allows feature sets within a web user interface to be quickly adapted to meet individual project requirements. Domain experts only have to provide code or business logic about handling/visualizing their domain's data products and about executing their pipelines and application work flows. Trident's microservices architecture is made up of light-weight services connected by a REST API and/or a message bus; a web interface elements are built using NodeJS, AngularJS, and HighCharts JavaScript libraries among others while backend services are written in NodeJS, PHP/Zend, and Python. The software suite currently consists of (1) a simple work flow execution framework to integrate, deploy, and execute pipelines and applications (2) a progress service to monitor work flows and sub-work flows (3) ImageX, an interactive image visualization service (3) an authentication and authorization service (4) a data service that handles archival, staging and serving of data products, and (5) a notification service that serves statistical collation and reporting needs of various projects. Several other additional components are under development. Trident is an umbrella project, that evolved from the One Degree Imager, Portal, Pipeline, and Archive (ODI-PPA) project which we had initially refactored toward (1) a powerful analysis/visualization portal for Globular Cluster System (GCS) survey data collected by IU researchers, 2) a data search and download portal for the IU Electron Microscopy Center's data (EMC-SCA), 3) a prototype archive for the Ludwig Maximilian University's Wide Field Imager. The new Trident software has been used to deploy (1) a metadata quality control and analytics portal (RADY-SCA) for DICOM formatted medical imaging data produced by the IU Radiology Center, 2) Several prototype work flows for different domains, 3) a snapshot tool within IU's Karst Desktop environment, 4) a limited component-set to serve GIS data within the IU GIS web portal. Trident SCA systems leverage supercomputing and storage resources at Indiana University but can be configured to make use of any cloud/grid resource, from local workstations/servers to (inter)national supercomputing facilities such as XSEDE.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
1
Citations
NaN
KQI