Modern biological research is increasingly informed by computational simulation experiments, which necessitate the development of methods for annotating, archiving, sharing, and reproducing the conducted experiments. These simulations increasingly require extensive collaboration among modelers, experimentalists, and engineers. The Minimum Information About a Simulation Experiment (MIASE) guidelines outline the information needed to share simulation experiments. SED-ML is a computer-readable format for the information outlined by MIASE, created as a community project and supported by many investigators and software tools. Level 1 Version 5 of SED-ML expands the ability of modelers to define simulations in SED-ML using the Kinetic Simulation Algorithm Onotoloy (KiSAO). While it was possible in Version 4 to define a simulation entirely using KiSAO, Version 5 now allows users to define tasks, model changes, ranges, and outputs using the ontology as well. SED-ML is supported by a growing ecosystem of investigators, model languages, and software tools, including various languages for constraint-based, kinetic, qualitative, rule-based, and spatial models, and many simulation tools, visual editors, model repositories, and validators. Additional information about SED-ML is available at https://sed-ml.org/.
Summary Standards are essential to the advancement of science and technology. In systems and synthetic biology, numerous standards and associated tools have been developed over the last 16 years. This special issue of the Journal of Integrative Bioinformatics aims to support the exchange, distribution and archiving of these standards, as well as to provide centralised and easily citable access to them.
Motivation: Open model repositories provide ready-to-reuse computational models of biological systems. Models within those repositories evolve over time, leading to many alternative and subsequent versions. Taken together, the underlying changes reflect a model’s provenance and thus can give valuable insights into the studied biology. Currently, however, changes cannot be semantically interpreted. To improve this situation, we developed an ontology of terms describing changes in computational biology models. The ontology can be used by scientists and within software to characterise model updates at the level of single changes. When studying or reusing a model, these annotations help with determining the relevance of a change in a given context. Methods: We manually studied changes in selected models from BioModels and the Physiome Model Repository. Using the BiVeS tool for difference detection, we then performed an automatic analysis of changes in all models published in these repositories. The resulting set of concepts led us to define candidate terms for the ontology. In a final step, we aggregated and classified these terms and built the first version of the ontology. Results: We present COMODI, an ontology needed because COmputational MOdels DIffer. It empowers users and software to describe changes in a model on the semantic level. COMODI also enables software to implement user-specific filter options for the display of model changes. Finally, COMODI is the next step towards predicting how a change in a model influences the simulation study. Conclusion: COMODI, coupled with our algorithm for difference detection, ensures the transparency of a model’s evolution and it enhances the traceability of updates and error corrections. A vailability: COMODI is encoded in OWL. It is openly available at http://comodi.sems.uni-rostock.de/.
The Simulation Experiment Description Markup Language (SED-ML) is an XML-based format for encoding simulation experiments, following the requirements defined in the MIASE guidelines. SED-ML allows one to define the model to use, the experimental task to run, and which result to produce.
Standards shape our everyday life. From nuts and bolts to electronic devices and technological processes, standardised products and processes are all around us. Standards have technological and economic benefits, such as making information exchange, production, and services more efficient. However, novel, innovative areas often either lack proper standards, or documents about standards in these areas are not available from a centralised platform or formal body (such as the International Standardisation Organisation). Systems and synthetic biology is a relatively novel area, and it is only in the last decade that the standardisation of data, information, and models related to systems and synthetic biology has become a community-wide effort. Several open standards have been established and are under continuous development as a community initiative. COMBINE, the ‘COmputational Modeling in BIology’ NEtwork has been established as an umbrella initiative to coordinate and promote the development of the various community standards and formats for computational models. There are yearly two meeting, HARMONY (Hackathons on Resources for Modeling in Biology), Hackathon-type meetings with a focus on development of the support for standards, and COMBINE forums, workshop-style events with oral presentations, discussion, poster, and breakout sessions for further developing the standards. For more information see http://co.mbine.org/. So far the different standards were published and made accessible through the standards’ web- pages or preprint services. The aim of this special issue is to provide a single, easily accessible and citable platform for the publication of standards in systems and synthetic biology. This special issue is intended to serve as a central access point to standards and related initiatives in systems and synthetic biology, it will be published annually to provide an opportunity for standard development groups to communicate updated specifications.
Abstract The EyeMatics project, embedded as a clinical use case in Germany’s Medical Informatics Initiative, is a large digital health initiative in ophthalmology. The objective is to improve the understanding of the treatment effects of intravitreal injections, the most frequent procedure to treat eye diseases. To achieve this, valuable patient data will be meaningfully integrated and visualized from different IT systems and hospital sites. EyeMatics emphasizes a governance framework that actively involves patient representatives, strictly implements interoperability standards, and employs artificial intelligence methods to extract biomarkers from tabular and clinical data as well as raw retinal scans. In this perspective paper, we delineate the strategies for user-centered implementation and health care–based evaluation in a multisite observational technology study.
Abstract The Simulation Experiment Description Markup Language (SED-ML) is an XML-based format for encoding simulation experiments, following the requirements defined in the MIASE guidelines. SED-ML allows one to define the model to use, the experimental task to run, and which result to produce.
BACKGROUND Secondary investigations into digital health records, including electronic patient data from German medical data integration centers (DICs), pave the way for enhanced future patient care. However, only limited information is captured regarding the integrity, traceability, and quality of the (sensitive) data elements. This lack of detail diminishes trust in the validity of the collected data. From a technical standpoint, adhering to the widely accepted FAIR (Findability, Accessibility, Interoperability, and Reusability) principles for data stewardship necessitates enriching data with provenance-related metadata. Provenance offers insights into the readiness for the reuse of a data element and serves as a supplier of data governance. OBJECTIVE The primary goal of this study is to augment the reusability of clinical routine data within a medical DIC for secondary utilization in clinical research. Our aim is to establish provenance traces that underpin the status of data integrity, reliability, and consequently, trust in electronic health records, thereby enhancing the accountability of the medical DIC. We present the implementation of a proof-of-concept provenance library integrating international standards as an initial step. METHODS We adhered to a customized road map for a provenance framework, and examined the data integration steps across the ETL (extract, transform, and load) phases. Following a maturity model, we derived requirements for a provenance library. Using this research approach, we formulated a provenance model with associated metadata and implemented a proof-of-concept provenance class. Furthermore, we seamlessly incorporated the internationally recognized Word Wide Web Consortium (W3C) provenance standard, aligned the resultant provenance records with the interoperable health care standard Fast Healthcare Interoperability Resources, and presented them in various representation formats. Ultimately, we conducted a thorough assessment of provenance trace measurements. RESULTS This study marks the inaugural implementation of integrated provenance traces at the data element level within a German medical DIC. We devised and executed a practical method that synergizes the robustness of quality- and health standard–guided (meta)data management practices. Our measurements indicate commendable pipeline execution times, attaining notable levels of accuracy and reliability in processing clinical routine data, thereby ensuring accountability in the medical DIC. These findings should inspire the development of additional tools aimed at providing evidence-based and reliable electronic health record services for secondary use. CONCLUSIONS The research method outlined for the proof-of-concept provenance class has been crafted to promote effective and reliable core data management practices. It aims to enhance biomedical data by imbuing it with meaningful provenance, thereby bolstering the benefits for both research and society. Additionally, it facilitates the streamlined reuse of biomedical data. As a result, the system mitigates risks, as data analysis without knowledge of the origin and quality of all data elements is rendered futile. While the approach was initially developed for the medical DIC use case, these principles can be universally applied throughout the scientific domain.
A major challenge for the dissemination, replication, and reuse of epidemiological forecasting studies during COVID-19 pandemics is the lack of clear guidelines and platforms to exchange models in a Findable, Accessible, Interoperable, and Reusable (FAIR) manner, facilitating reproducibility of research outcomes. During the beginning of pandemics, models were developed in diverse tools that were not interoperable, opaque without traceability and semantics, and scattered across various platforms - making them hard to locate, infer and reuse. In this work, we demonstrate that implementing the standards developed by the systems biology community to encode and share COVID-19 epidemiological models can serve as a roadmap to implement models as a tool in medical informatics, in general. As a proof-of-concept, we encoded and shared 24 epidemiological models using the standard format for model exchange in systems biology, annotated them with cross-references to data resources, packed up all associated files in COMBINE archives for easy sharing, and finally, disseminated the models through BioModels repository to significantly enhance their reproducibility and repurposing potential. We recommend the use of systems biology standards to encode and share models of epidemic and pandemic forecasts to improve their findability, accessibility, interoperability, reusability, and reproducibility.
BACKGROUND Thorough data stewardship is a key enabler of comprehensive health research. Processes such as data collection, storage, access, sharing, and analytics require researchers to follow elaborate data management strategies properly and consistently. Studies have shown that findable, accessible, interoperable, and reusable (FAIR) data leads to improved data sharing in different scientific domains. OBJECTIVE This scoping review identifies and discusses concepts, approaches, implementation experiences, and lessons learned in FAIR initiatives in health research data. METHODS The Arksey and O’Malley stage-based methodological framework for scoping reviews was applied. PubMed, Web of Science, and Google Scholar were searched to access relevant publications. Articles written in English, published between 2014 and 2020, and addressing FAIR concepts or practices in the health domain were included. The 3 data sources were deduplicated using a reference management software. In total, 2 independent authors reviewed the eligibility of each article based on defined inclusion and exclusion criteria. A charting tool was used to extract information from the full-text papers. The results were reported using the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. RESULTS A total of 2.18% (34/1561) of the screened articles were included in the final review. The authors reported FAIRification approaches, which include interpolation, inclusion of comprehensive data dictionaries, repository design, semantic interoperability, ontologies, data quality, linked data, and requirement gathering for FAIRification tools. Challenges and mitigation strategies associated with FAIRification, such as high setup costs, data politics, technical and administrative issues, privacy concerns, and difficulties encountered in sharing health data despite its sensitive nature were also reported. We found various workflows, tools, and infrastructures designed by different groups worldwide to facilitate the FAIRification of health research data. We also uncovered a wide range of problems and questions that researchers are trying to address by using the different workflows, tools, and infrastructures. Although the concept of FAIR data stewardship in the health research domain is relatively new, almost all continents have been reached by at least one network trying to achieve health data FAIRness. Documented outcomes of FAIRification efforts include peer-reviewed publications, improved data sharing, facilitated data reuse, return on investment, and new treatments. Successful FAIRification of data has informed the management and prognosis of various diseases such as cancer, cardiovascular diseases, and neurological diseases. Efforts to FAIRify data on a wider variety of diseases have been ongoing since the COVID-19 pandemic. CONCLUSIONS This work summarises projects, tools, and workflows for the FAIRification of health research data. The comprehensive review shows that implementing the FAIR concept in health data stewardship carries the promise of improved research data management and transparency in the era of big data and open research publishing. INTERNATIONAL REGISTERED REPORT RR2-10.2196/22505