Workflow is a well-established means by which to capture scientific methods in an abstract graph of interrelated processing tasks. The reproducibility of scientific workflows is therefore fundamental to reproducible e-Science. However, the ability to record all the required details so as to make a workflow fully reproducible is a long-standing problem that is very difficult to solve. In this paper, we introduce an approach that integrates system description, source control, container management and automatic deployment techniques to facilitate workflow reproducibility. We have developed a framework that leverages this integration to support workflow execution, re-execution and reproducibility in the cloud and in a personal computing environment. We demonstrate the effectiveness of our approach by examining various aspects of repeatability and reproducibility on real scientific workflows. The framework allows workflow and task images to be captured automatically, which improves not only repeatability but also runtime performance. It also gives workflows portability across different cloud environments. Finally, the framework can also track changes in the development of tasks and workflows to protect them from unintentional failures.
Cloud computing has the potential to provide low-cost, scalable computing, but cloud security is a major area of concern. Many organizations are therefore considering using a combination of a secure internal cloud, along with (what they perceive to be) less secure public clouds. However, this raises the issue of how to partition applications across a set of clouds, while meeting security requirements. Currently, this is usually done on an ad-hoc basis, which is potentially error-prone, or for simplicity the whole application is deployed on a single cloud, so removing the possible performance and availability benefits of exploiting multiple clouds within a single application. This paper describes an alternative to ad-hoc approaches – a method that determines all ways in which applications structured as workflows can be partitioned over the set of available clouds such that security requirements are met. The approach is based on a Multi-Level Security model that extends Bell-LaPadula to encompass cloud computing. This includes introducing workflow transformations that are needed where data is communicated between clouds. In specific cases these transformations can result in security breaches, but the paper describes how these can be detected. Once a set of valid options has been generated, a cost model is used to rank them. The method has been implemented in a tool, which is described in the paper.
The use of cloud resources for processing and analysing medical data has the potential to revolutionise the treatment of a number of chronic conditions. For example, it has been shown that it is possible to manage conditions such as diabetes, obesity and cardiovascular disease by increasing the right forms of physical activity for the patient. Typically, movement data is collected for a patient over a period of several weeks using a wrist worn accelerometer. This data, however, is large and its analysis can require significant computational resources. Cloud computing offers a convenient solution as it can be paid for as needed and is capable of scaling to store and process large numbers of data sets simultaneously. However, because the charging model for the cloud represents, to some extent, an unknown cost and therefore risk to project managers, it is important to have an estimate of the likely data processing and storage costs that will be required to analyse a set of data. This could take the form of data collected from a patient in clinic or of entire cohorts of data collected from large studies. If, however, an accurate model was available that could predict the compute and storage requirements associated with a piece of analysis code, decisions could be made as to the scale of resources required in order to obtain results within a known timescale. This paper makes use of provenance and performance data collected as part of routine e-Science Central workflow executions to examine the feasibility of automatically generating predictive models for workflow execution times based solely on observed characteristics such as data volumes processed, algorithm settings and execution durations. The utility of this approach will be demonstrated via a set of benchmarking examples before being used to model workflow executions performed as part of two large medical movement analysis studies.
The ability to accurately predict the performance of software components executing within a Cloud environment is an area of intense interest to many researchers. The availability of an accurate prediction of the time taken for a piece of code to execute would be beneficial for both planning and cost optimisation purposes. To that end, this paper proposes a performance data capture and modelling architecture that can be used to generate models of code execution time that are dynamically updated as additional performance data is collected. To demonstrate the utility of this approach, the workflow engine within the e-Science Central Cloud platform has been instrumented to capture execution data with a view to generating predictive models of workflow performance. Models have been generated for both simple and more complex workflow components operating on local hardware and within a virtualised Cloud environment and the ability to generate accurate performance predictions given a number of caveats is demonstrated.
The IEEE International Conference on Cloud Engineering (IC2E) was first initiated in December 2011 as the needs became apparent for a new forum to present and discuss engineering issues related to cloud computing. IC2E 2013 has been organized by dedicated and capable Program Committee Chairs Roy Campbell, Hui Lei, and Volker Markl, with a group of outstanding Program Committee members from all over the world.
This paper describes a novel study management platform that is being used to collect, process and analyse data gathered from a large-scale pan-European digital healthcare study. The platform consists of two main components. Firstly a secure, scalable, cloud-based platform to ingest and process data uploaded from body-worn sensors, as well as from clinical evaluation forms. Features extracted from this data are then loaded into a Data Warehouse with a novel schema designed specifically for study data. This allows scientists to explore, analyse and visualise this data in a variety of different ways. A key aspect of the warehouse design is that it also stores metadata describing the types and format of the data. This enables automatic report generation, exploratory data analysis and error checking. The overall result is a flexible, general purpose system that is open-source and uses the cloud for scalability. This paper describes the design of the integrated study data platform and its use in the large Mobilise-D study that has collected and analysed both sensor and clinical data from over 3.000 participants.
Currently, business requirements for rapid operational efficiency, customer responsiveness as well as rapid adaptability are driving the need for ever increasing communication and integration capabilities of the software assets. Enterprise Application Integration (EAI), which is the process of integrating enterprise systems with existing applications and in general distributed computing, have produced diverse integration techniques and approaches to undertake these challenges. This has brought the development of Service-Oriented Architecture (SOA) variants, which is partly supported by commonly accepted standards that ensure interoperability, sharing and reusability. As a result of this, a safer and faster level of return on investment (ROI) can be generated while inter-software communication and integration has becomes ever easier. In this paper we discuss ESB and evaluate the concept against already existing broker architectures and paradigms.