Abstract Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec
The COVID-19 pandemic has challenged healthcare systems and research worldwide. Data is collected all over the world and needs to be integrated and made available to other researchers quickly. However, the various heterogeneous information systems that are used in hospitals can result in fragmentation of health data over multiple data 'silos' that are not interoperable for analysis. Consequently, clinical observations in hospitalised patients are not prepared to be reused efficiently and timely. There is a need to adapt the research data management in hospitals to make COVID-19 observational patient data machine actionable, i.e. more Findable, Accessible, Interoperable and Reusable (FAIR) for humans and machines. We therefore applied the FAIR principles in the hospital to make patient data more FAIR.
The first subset was created by digitizing and curating the seminal report of Amerine and Winkler (1944), which provided grape harvest dates (GHDs), the quality of musts and wines, and wine tasting notes for 148 cultivars from 1935-1941 across five contrasting climatic regions of California. To put this dataset into a climate change context, we collected GHDs and must oBrix records from 1994 to 2018 for four representative cultivars in one of the five studied regions (Napa). Finally, we integrated meteorological data of the five regions during 1911-2018 and calculated bioclimatic indices important for grape. The resulting database is unique and valuable for assessing the fitness between cultivars across environments in order to mitigate the effects of climate change.
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Are they associated in some other way? Such relationships between the mapped terms are often not documented, leading to incorrect assumptions and making them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Also, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. The Simple Standard for Sharing Ontological Mappings (SSSOM) addresses these problems by: 1. Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. 2. Defining an easy to use table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data standards. 3. Implementing open and community-driven collaborative workflows designed to evolve the standard continuously to address changing requirements and mapping practices. 4. Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases, and survey some existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable, and Reusable (FAIR). The SSSOM specification is at http://w3id.org/sssom/spec.
Plant scientists use Functional and Structural Plant (FSP) Models to model the interactions between the biological functions and physical structures within a limited space-time range. To break through the limitation, an integration that compounds different FSP models could be a possible solution. However, the integration involves many technical dimensions and a generic software infrastructure for all integration cases is not possible. In this dissertation, we analyze the requirements of the integration with all the technical dimensions. Instead of an infrastructure, we propose a generic technical framework consisting of three technologies to allow the integration of different FSP models hosted on the same and different FSP modeling platforms in a flexible way. We demonstrate the usability of the framework by the implementation of a full infrastructure for the integration of two specific FSP models, and we illustrate the effectiveness of the infrastructure by different integrative scenarios.
The subset1 was created by digitizing and curating the seminal report of Amerine and Winkler (1944), which provided grape harvest dates (GHDs), the quality of musts and wines, and wine tasting notes for 148 cultivars from 1935-1941 across five contrasting climatic regions of California. To put this dataset into a climate change context, we collected GHDs and mustsugar content (oBrix) records from 1991 to 2018 for four representative cultivars in one of the five studied regions (Napa) in subset2. Finally, we integrated meteorological data of the five regions during 1911-2018 and calculated bioclimatic indices important for grape in subset3. The resulting database is unique and valuable for assessing the fitness between cultivars across environments in order to mitigate the effects of climate change.
Introduction - Within the FSPM community, different teams of researchers have specialized on different processes. Thus there is an increasing wish to re-use the diverse simulation packages which were already created but which are usually implemented within different software environments, often not directly compatible with each other. The OpenAlea platform (Pradal et al., 2008) was developed as an environment to connect and reuse components with specific functionality in a scientific workflow environment. However, not all widely-used FSPM-related tools are already available from OpenAlea. In our work, we created an interface between OpenAlea and the FSPM platform GroIMP (Kniemeyer, 2008). The latter contains some dedicated tools, among them a simulator for distribution and interception, based on stochastic path tracing. This radiation model is interesting due to its accuracy, its spectral capabilities and because it is already used in different applications. To demonstrate the technical usability of our interface, we took an established simulator for the growth and structural development of apple MAppleT (Costes et al., 2008), which is already accessible from OpenAlea but which does not include a radiation model on its own. By exporting the generated tree structures from MAppleT via OpenAlea to GroIMP, we were able to employ GroIMP's model on them and to reimport the structures with added information on absorbed light at phytomer level. Within OpenAlea, photosynthesis was then calculated and tentatively assumed effects on organ sizes could be visualized. Our conceptual contributions are a generic web architecture and the bidirectional matching between two different multiscale formalisms for topology and geometry in FSPMs. OpenAlea - OpenAlea emphasizes modularity and reuse by using a central data structure, the MTG (Godin and Caraglio, 1998). This enables indirect communication between the components that are integrated in the platform, using a blackboard architecture. It captures the multiscale organization of plant canopies, particularly its topology. Various properties can also be stored at the different scales. MTG vertices are topological elements that represent modular parts of a plant (e.g., axis, phytomer, organ). The neighborhood of each element is stored in the MTG as well as its associated properties. Geometrical elements are stored separately in an external scene graph for efficiency but are available from a property of the MTG. GroIMP - In GroIMP, a scene, including virtual plants, is represented as a rooted graph which can be an MTG in the sense of Godin and Caraglio (1998). At the same time, it has the semantics of a scene graph (a well-known data model in computer graphics). In contrast to the MTG in OpenAlea, it contains all information about the scene including geometry. Its nodes can represent geometrical objects (e.g., standing for plant organs), sources, spatial transformations (e.g., rotations), or they are abstract nodes used purely for replacement purposes during development. The development of scenes, including plants, is modelled by parallel graph rewriting: Rules are applied by substituting in every timestep all instances of graphs which occur as left-hand side of a rule by the corresponding right-hand side. L-systems, operating on strings, can be subsumed as special cases under this formalism. The Interface - Although the data models of OpenAlea and GroIMP were both derived from the same mathematical concept, the implemented data structures of both platforms differ in several aspects. To bridge the gap between them, a data extractor from OpenAlea to GroIMP has first to combine the topological (MTG) with the geometrical information and to build a scene graph where the global positional information of each object is split into the transformation matrices of its predecessors (in the graph) and of itself. Furthermore, the scale information, represented by an indexing of nodes in OpenAlea, must be evaluated to build decomposition edges between all node pairs where a direct is-part-of relationship shall exist in the GroIMP graph. An extractor for the reverse data flow, from GroIMP to OpenAlea, faces another problem: since the GroIMP graph can contain cycles in the general case, a spanning tree has first to be derived within each scale level to be able to form a valid MTG on OpenAlea. Our graph model for data exchange is a canonical data model that makes the interoperability infrastructure independent from any specific FSPMs. It is a rooted, directed graph with typed nodes and thus more generic than an MTG. Technically, our connecting software tool consists of a client-side interface on top of OpenAlea and a server-side interface on top of GroIMP. An XML based data exchange format called XEG specified from the generic data exchange graph model is provided for the integration. Details are given by Long (2019). Results and Discussion - In a case study, we have applied our interface to provide an integration of the MAppleT model (Costes et al., 2008), that simulates apple tree growth and development based on stochastics and biomechanics and which is accessible via OpenAlea, with a interception model based on stochastic pathtracing implemented within GroIMP. The objective was to get a bi-platform FSPM that simulates growth by taking local interception into account. The workflow is as follows: Through the client-side interface the MTG generated by MAppleT is translated to an XEG graph, which is then packed to a message for transmission to the interception model which resides remotely (on GroIMP). Through the server-side interface, the message is received, unpacked and translated into a GroIMP graph, forming the input of the interception model. Then, update rules are applied which change a property absorbed light of nodes representing geometrical objects, according to the raytracing results. Through the server-side interface, the result is translated to a data frame in XEG packed to be sent back to OpenAlea (respectively, MAppleT) to complete the cross-platform simulation. Through the client-side interface, the data is unpacked and translated to Open- Alea as an MTG. Here, as growth in MAppleT is originally not based on light, we have as a first attempt applied an ad-hoc computation of biomass based directly on the intercepted light. The growth of an apple fruit then depends on the new biomass and thus on the values from GroIMP. Botanically, this scenario is certainly not realistic since it disregards any translocation of assimilates, but it proves technical usability of the interface. Acknowledgements - Parts of this work were funded by DFG and ANR in the joint project Multiscale functionalstructural plant modelling at the example of apple trees, DFG grant number KU 847/11-1.
The subset1 was created by digitizing and curating the seminal report of Amerine and Winkler (1944), which provided grape harvest dates (GHDs), the quality of musts and wines, and wine tasting notes for 148 cultivars from 1935-1941 across five contrasting climatic regions of California. To put this dataset into a climate change context, we collected GHDs and mustsugar content (oBrix) records from 1991 to 2018 for four representative cultivars in one of the five studied regions (Napa) in subset2. Finally, we integrated meteorological data of the five regions during 1911-2018 and calculated bioclimatic indices important for grape in subset3. The resulting database is unique and valuable for assessing the fitness between cultivars across environments in order to mitigate the effects of climate change.