Microarray techniques are one of the main methods used to investigate thousands of gene expression profiles for enlightening complex biological processes responsible for serious diseases, with a great scientific impact and a wide application area. Several standalone applications had been developed in order to analyze microarray data. Two of the most known free analysis software packages are the R-based Bioconductor and dChip. The part of dChip software concerning the calculation and the analysis of gene expression has been modified to permit its execution on both cluster environments (supercomputers) and Grid infrastructures (distributed computing). This work is not aimed at replacing existing tools, but it provides researchers with a method to analyze large datasets without any hardware or software constraints. An application able to perform the computation and the analysis of gene expression on large datasets has been developed using algorithms provided by dChip. Different tests have been carried out in order to validate the results and to compare the performances obtained on different infrastructures. Validation tests have been performed using a small dataset related to the comparison of HUVEC (Human Umbilical Vein Endothelial Cells) and Fibroblasts, derived from same donors, treated with IFN-α. Moreover performance tests have been executed just to compare performances on different environments using a large dataset including about 1000 samples related to Breast Cancer patients. A Grid-enabled software application for the analysis of large Microarray datasets has been proposed. DChip software has been ported on Linux platform and modified, using appropriate parallelization strategies, to permit its execution on both cluster environments and Grid infrastructures. The added value provided by the use of Grid technologies is the possibility to exploit both computational and data Grid infrastructures to analyze large datasets of distributed data. The software has been validated and performances on cluster and Grid environments have been compared obtaining quite good scalability results.
Abstract Several systems have been presented in the last years in order to manage the complexity of large microarray experiments. Although good results have been achieved, most systems tend to lack in one or more fields. A Grid based approach may provide a shared, standardized and reliable solution for storage and analysis of biological data, in order to maximize the results of experimental efforts. A Grid framework has been therefore adopted due to the necessity of remotely accessing large amounts of distributed data as well as to scale computational performances for terabyte datasets. Two different biological studies have been planned in order to highlight the benefits that can emerge from our Grid based platform. The described environment relies on storage services and computational services provided by the gLite Grid middleware. The Grid environment is also able to exploit the added value of metadata in order to let users better classify and search experiments. A state-of-art Grid portal has been implemented in order to hide the complexity of framework from end users and to make them able to easily access available services and data. The functional architecture of the portal is described. As a first test of the system performances, a gene expression analysis has been performed on a dataset of Affymetrix GeneChip ® Rat Expression Array RAE230A, from the ArrayExpress database. The sequence of analysis includes three steps: (i) group opening and image set uploading, (ii) normalization, and (iii) model based gene expression (based on PM/MM difference model). Two different Linux versions (sequential and parallel) of the dChip software have been developed to implement the analysis and have been tested on a cluster. From results, it emerges that the parallelization of the analysis process and the execution of parallel jobs on distributed computational resources actually improve the performances. Moreover, the Grid environment have been tested both against the possibility of uploading and accessing distributed datasets through the Grid middleware and against its ability in managing the execution of jobs on distributed computational resources. Results from the Grid test will be discussed in a further paper.
Complex microarray gene expression datasets can be used for many independent analyses and are particularly interesting for the validation of potential biomarkers and multi-gene classifiers. This article presents a novel method to perform correlations between microarray gene expression data and clinico-pathological data through a combination of available and newly developed processing tools. We developed Survival Online (available at http://ada.dist.unige.it:8080/enginframe/bioinf/bioinf.xml ), a Web-based system that allows for the analysis of Affymetrix GeneChip microarrays by using a parallel version of dChip. The user is first enabled to select pre-loaded datasets or single samples thereof, as well as single genes or lists of genes. Expression values of selected genes are then correlated with sample annotation data by uni- or multi-variate Cox regression and survival analyses. The system was tested using publicly available breast cancer datasets and GO (Gene Ontology) derived gene lists or single genes for survival analyses. The system can be used by bio-medical researchers without specific computation skills to validate potential biomarkers or multi-gene classifiers. The design of the service, the parallelization of pre-processing tasks and the implementation on an HPC (High Performance Computing) environment make this system a useful tool for validation on several independent datasets.
The Operating Room (OR) is a key resource of all major hospitals, but it also accounts for up 40% of resource costs. Improving cost effectiveness, while maintaining a quality of care, is a universal objective. These goals imply an optimization of planning and a scheduling of the activities involved. This is highly challenging due to the inherent variable and unpredictable nature of surgery.A Business Process Modeling Notation (BPMN 2.0) was used for the representation of the "OR Process" (being defined as the sequence of all of the elementary steps between "patient ready for surgery" to "patient operated upon") as a general pathway ("path"). The path was then both further standardized as much as possible and, at the same time, keeping all of the key-elements that would allow one to address or define the other steps of planning, and the inherent and wide variability in terms of patient specificity. The path was used to schedule OR activity, room-by-room, and day-by-day, feeding the process from a "waiting list database" and using a mathematical optimization model with the objective of ending up in an optimized planning.The OR process was defined with special attention paid to flows, timing and resource involvement. Standardization involved a dynamics operation and defined an expected operating time for each operation. The optimization model has been implemented and tested on real clinical data. The comparison of the results reported with the real data, shows that by using the optimization model, allows for the scheduling of about 30% more patients than in actual practice, as well as to better exploit the OR efficiency, increasing the average operating room utilization rate up to 20%.The optimization of OR activity planning is essential in order to manage the hospital's waiting list. Optimal planning is facilitated by defining the operation as a standard pathway where all variables are taken into account. By allowing a precise scheduling, it feeds the process of planning and, further up-stream, the management of a waiting list in an interactive and bi-directional dynamic process.
This chapter describes a Grid oriented platform -the Bio Med Portal- as a new tool to promote collaboration and cooperation among scientists and healthcare research groups, enabling the remote use of resources integrated in complex software platform services forming a virtual laboratory. In fact, nowadays many biomedicine studies are dealing with large, distributed, and heterogeneous repositories as well as with computationally demanding analyses, and complex integration techniques are more often required to handle this complexity. The Bio Med Portal is designed to host several medical services and it is able to deploy several analysis algorithms. The scope of this chapter is both to present a Grid application with its own medical use case and to emphasize the benefit that a new Design Paradigm based on Grid could provide to research groups spread in geographically distributed sites.
Robust, extensible and distributed databases integrating clinical, imaging and molecular data represent a substantial challenge for modern neuroscience. It is even more difficult to provide extensible software environments able to effectively target the rapidly changing data requirements and structures of research experiments. There is an increasing request from the neuroscience community for software tools addressing technical challenges about: (i) supporting researchers in the medical field to carry out data analysis using integrated bioinformatics services and tools; (ii) handling multimodal/multiscale data and metadata, enabling the injection of several different data types according to structured schemas; (iii) providing high extensibility, in order to address different requirements deriving from a large variety of applications simply through a user runtime configuration.A dynamically extensible data structure supporting collaborative multidisciplinary research projects in neuroscience has been defined and implemented. We have considered extensibility issues from two different points of view. First, the improvement of data flexibility has been taken into account. This has been done through the development of a methodology for the dynamic creation and use of data types and related metadata, based on the definition of "meta" data model. This way, users are not constrainted to a set of predefined data and the model can be easily extensible and applicable to different contexts. Second, users have been enabled to easily customize and extend the experimental procedures in order to track each step of acquisition or analysis. This has been achieved through a process-event data structure, a multipurpose taxonomic schema composed by two generic main objects: events and processes. Then, a repository has been built based on such data model and structure, and deployed on distributed resources thanks to a Grid-based approach. Finally, data integration aspects have been addressed by providing the repository application with an efficient dynamic interface designed to enable the user to both easily query the data depending on defined datatypes and view all the data of every patient in an integrated and simple way.The results of our work have been twofold. First, a dynamically extensible data model has been implemented and tested based on a "meta" data-model enabling users to define their own data types independently from the application context. This data model has allowed users to dynamically include additional data types without the need of rebuilding the underlying database. Then a complex process-event data structure has been built, based on this data model, describing patient-centered diagnostic processes and merging information from data and metadata. Second, a repository implementing such a data structure has been deployed on a distributed Data Grid in order to provide scalability both in terms of data input and data storage and to exploit distributed data and computational approaches in order to share resources more efficiently. Moreover, data managing has been made possible through a friendly web interface. The driving principle of not being forced to preconfigured data types has been satisfied. It is up to users to dynamically configure the data model for the given experiment or data acquisition program, thus making it potentially suitable for customized applications.Based on such repository, data managing has been made possible through a friendly web interface. The driving principle of not being forced to preconfigured data types has been satisfied. It is up to users to dynamically configure the data model for the given experiment or data acquisition program, thus making it potentially suitable for customized applications.
The XTENS (eXTensible Environment for NeuroScience) platform consists in an highly extensible environment for collaborative work that improve repeatability of experiment and provides data storage and analysis capabilities. The platform is divided in repository and application domains, branched in services with different purpose. The first domain is the central component of the platform and consists in a multimodal repository with a client-server architecture. The second one provides remote tools for image and signal visualization and analysis. The main issue for such a platform is not only to provide an extensible collaborative environment, but also to build a development platform for testing models and algorithms in neuroscience. For these reasons a Grid approach has been considered. Both computational and data Grids infrastructures can be exploited to analyze and share large datasets of distributed data. The architecture has been deployed to support surgical planning for patients affected by drug resistant epilepsy. In that scenario, a complex analysis for a fully multimodal dataset including different image modalities, EEG and video is required to localize the origin of the ictal discharge and critical brain areas. As first results, prototype versions of both repository and application domain components are presented.