Publishing, analysing or properly accessing the abundant information resulting largely from experimental studies in the biomedical domain are current challenges for the research community. Problems with the extraction of relevant information, redundant data, and lack of associations or provenance are good examples of the main concerns. The innovative nanopublication publishing strategy tries to overcome these issues by representing the essential pieces of publishable information on the Semantic Web. However, existing methods to create these Resource Description Framework-based data snippets are based on complex scripting procedures, hindering their use by the community. Therefore, novel and automated strategies are needed to explore the evident value of nanopublications and to enable data attribution mechanisms, an important feature for data owners. To solve these challenges, the authors introduce the second generation of the COEUS open-source application framework (http://bioinformatics.ua.pt/coeus/), an automated platform to integrate heterogeneous scientific outcomes into nanopublications. This results in seamless integration, making data accessible and citable at the same time. No additional scripting methods are needed. A validation of a nanopublishing pipeline is described to demonstrate the system functionalities, integrating and publishing common biomedical achievements into the Semantic Web ecosystem.
Patient registries are an essential tool to increase current knowledge regarding rare diseases. Understanding these data is a vital step to improve patient treatments and to create the most adequate tools for personalized medicine. However, the growing number of disease-specific patient registries brings also new technical challenges. Usually, these systems are developed as closed data silos, with independent formats and models, lacking comprehensive mechanisms to enable data sharing. To tackle these challenges, we developed a Semantic Web based solution that allows connecting distributed and heterogeneous registries, enabling the federation of knowledge between multiple independent environments. This semantic layer creates a holistic view over a set of anonymised registries, supporting semantic data representation, integrated access, and querying. The implemented system gave us the opportunity to answer challenging questions across disperse rare disease patient registries. The interconnection between those registries using Semantic Web technologies benefits our final solution in a way that we can query single or multiple instances according to our needs. The outcome is a unique semantic layer, connecting miscellaneous registries and delivering a lightweight holistic perspective over the wealth of knowledge stemming from linked rare disease patient registries.
The tremendous quantity of data stored daily in healthcare institutions demands the development of new methods to summarize and reuse available information in clinical practice. In order to leverage modern healthcare information systems, new strategies must be developed that address challenges such as extraction of relevant information, data redundancy, and the lack of associations within the data. This article proposes a pipeline to overcome these challenges in the context of medical imaging reports, by automatically extracting and linking information, and summarizing natural language reports into an ontology model. Using data from the Physionet MIMIC II database, we created a semantic knowledge base with more than 6.5 millions of triples obtained from a collection of 16,000 radiology reports.
Computational annotation of textual information has taken on an important role in knowledge extraction from the biomedical literature, since most of the relevant information from scientific findings is still maintained in text format. In this endeavour, annotation tools can assist in the identification of biomedical concepts and their relationships, providing faster reading and curation processes, with reduced costs. However, the separate usage of distinct annotation systems results in highly heterogeneous data, as it is difficult to efficiently combine and exchange this valuable asset. Moreover, despite the existence of several annotation formats, there is no unified way to integrate miscellaneous annotation outcomes into a reusable, sharable and searchable structure. Taking up this challenge, we present a modular architecture for textual information integration using semantic web features and services. The solution described allows the migration of curation data into a common model, providing a suitable transition process in which multiple annotation data can be integrated and enriched, with the possibility of being shared, compared and reused across semantic knowledge bases.
Rare disease patient registries are now an essential tool for all clinical stakeholders. These systems' features aim to improve patient treatments by collecting comprehensive electronic patient records. Understanding these data is a vital step towards personalized medicine. Yet, the growing number of disease-specific patient registries brings new challenges for life sciences developers. These systems are closed data silos, with independent formats and data models. As they were built with security and privacy in mind, available tools lack comprehensive data access mechanisms, thus making data sharing a complex process. However, exchanging knowledge is essential to a better understanding of studied diseases. To tackle these challenges we introduce a semantic web-based architecture to connect distributed and heterogeneous registries. This enables the federation of knowledge between multiple independent environments. The semantic web paradigm enhances the ways we deal with data, optimising how we can create, infer and publish knowledge. Hence, we adopt these modern standards to deploy patient registry add ons. These can extract anonymised data and elevate them to a knowledge-oriented format, common to all registries. The outcome is a unique semantic layer, connecting miscellaneous registries, which we access using federated querying. Ultimately, this strategy empowers an holistic view through connected registries, enabling state-of-the-art semantic data sharing and access.
The continuous growth of unstructured information resulting from biomedical research is a trending challenge for the scientific community. In this way, novel methods for information management are emerging to improve knowledge distribution and access. The concept of nanopublications illustrates one of these recent strategies to implement machine-readable knowledge assertions. It tries to overcome inconsistency, ambiguity and redundancy of traditional publications. The purpose is that they are more suited than traditional papers to represent relationships that exist between research data, providing an efficient mechanism for knowledge exchange. Although the evident benefits of these RDF-based snippets, its applicability stills challenging due to the inexistence of extraction and publications methods. To solve that issue, we propose an automated workflow for nanopublications generation from biomedical literature. The proposed method consists of exploring an automated information extraction tool for relevant information detection from published documents and respective standardization of the mined information through semantic web recommendations, for further exploration.
The continuous growth in quantity and diversity of life sciences data is triggering several bioinformatics challenges to be able to integrate and select desired information for later study. The majority of these data are scattered through independent systems disregarding interoperability features, which makes data integration processes not a trivial task. Consequently, several ETL (Extract-Transform-and-Load) frameworks have been developed to make data integrations tasks suitable for later exploration studies, providing better solutions for data heterogeneity, diversity and distribution. However, current advanced data integration tasks depend on large and heterogeneous data sources that must be modelled according to the source specifications and network conditions. Furthermore, these automated tasks are significantly dependent of sequential processes that dramatically increase the global request and processing time. Without estimation of the task completion time, the whole research workflow becomes even more challenging. This paper presents DISim, an ontology for data integration simulation, to estimate large and heterogeneous data integration jobs, in order to provide valuable outputs to enhance decision-making scenarios.