Model-to-model (M2M) transformation is a key ingredient in a typical Model-Driven Engineering workflow and there are several tailored high-level interpreted languages for capturing and executing such transformations. While these languages enable the specification of concise transformations through task-specific constructs (rules/mappings, bindings), their use can pose scalability challenges when it comes to very large models. In this paper, we present an architecture for optimising the execution of model-to-model transformations written in such a language, by leveraging static analysis and automated program rewriting techniques. We demonstrate how static analysis and dependency information between rules can be used to reduce the size of the transformation trace and to optimise certain classes of transformations. Finally, we detail the performance benefits that can be delivered by this form of optimisation, through a series of benchmarks performed with an existing transformation language (Epsilon Transformation Language - ETL) and EMF-based models. Our experiments have shown considerable performance improvements compared to the existing ETL execution engine, without sacrificing any features of the language.
Adoption of low-code engineering in complex enterprise applications also increases the size of the underlying models.In such cases, the increasing complexity of the applications and the growing size of the underlying artefacts, various scalability challenges might arise for low-code platforms.Task-specific programming languages, such as OCL and EOL, are tailored to manage the underlying models.Existing model management languages have significant performance impact when it comes to complex queries operating over large-scale models reaching magnitudes of millions of elements in size.We propose an approach for automatically mapping expressions in Epsilon validation programs to VIATRA graph patterns to make the validation of large-scale low-code system models scalable by leveraging the incremental execution engine of VIATRA.Finally, we evaluate the performance of the proposed approach on large Java models of the Eclipse source code.Our results show performance speed-up up to 1481x compared to the sequential execution in Epsilon.
XML Metadata Interchange (XMI) is an OMG-standardised model exchange format, which is natively supported by the Eclipse Modeling Framework (EMF) and the majority of the modelling and model management languages and tools. Whilst XMI is widely supported, the XMI parser provided by EMF is inefficient in some cases where models are readonly (such as input models for model query, model-to-model transformation, etc) as it always requires loading the entire model into memory. In this paper we present a novel algorithm, and a prototype implementation (SmartSAX), which is capable of partially loading models persisted in XMI. SmartSAX offers improved performance, in terms of loading time and memory footprint, over the default EMF XMI parser. We describe the algorithm in detail, and present benchmarking results that demonstrate the substantial improvements of the prototype implementation over the XMI parser provided by EMF.
Large-scale software repository mining typically requires substantial storage and computational resources, and often involves a large number of calls to (rate-limited) APIs such as those of GitHub and StackOverflow. This creates a growing need for distributed execution of repository mining programs to which remote collaborators can contribute computational and storage resources, as well as API quotas (ideally without sharing API access tokens or credentials). In this paper we introduce Crossflow, a novel framework for building distributed repository mining programs. We demonstrate how Crossflow can delegate mining jobs to remote workers and cache their results, and how workers can implement advanced behaviour such as load balancing and rejecting jobs they cannot perform (e.g. due to lack of space, credentials for a specific API).
Softeam has over 20 years of experience providing UML-based modelling solutions, such as its Modelio modelling tool, and its Constellation enterprise model management and collaboration environment. Due to the increasing number and size of the models used by Softeam's clients, Softeam joined the MONDO FP7 EU research project, which worked on solutions for these scalability challenges and produced the Hawk model indexer among other results. This paper presents the technical details and several case studies on the integration of Hawk into Softeam's toolset. The first case study measured the performance of Hawk's Modelio support using varying amounts of memory for the Neo4j backend. In another case study, Hawk was integrated into Constellation to provide scalable global querying of model repositories. Finally, the combination of Hawk and the Epsilon Generation Language was compared against Modelio for document generation: for the largest model, Hawk was two orders of magnitude faster.
Recent research in scalable model-driven engineering now allows very large models to be stored and queried. Due to their size, rather than transferring such models over the network in their entirety, it is typically more efficient to access them remotely using networked services (e.g. model repositories, model indexes). Little attention has been paid so far to the nature of these services, and whether they remain responsive with an increasing number of concurrent clients. This paper extends a previous empirical study on the impact of certain key decisions on the scalability of concurrent model queries on two domains, using an Eclipse Connected Data Objects model repository, four configurations of the Hawk model index and a Neo4j-based configuration of the NeoEMF model store. The study evaluates the impact of the network protocol, the API design, the caching layer, the query language and the type of database and analyses the reasons for their varying levels of performance. The design of the API was shown to make a bigger difference compared to the network protocol (HTTP/TCP) used. Where available, the query-specific indexed and derived attributes in Hawk outperformed the comprehensive generic caching in CDO. Finally, the results illustrate the still ongoing evolution of graph databases: two tools using different versions of the same backend had very different performance, with one slower than CDO and the other faster than it.