In order to assess the progress of Open Science in France, the French Ministry of Higher Education, Research and Innovation published the French Open Science Monitor in 2019. Even if this tool has a bias, for only the publications with a DOI can be considered, thus promoting article-dominant research communities, its indicators are trustworthy and reliable. The University of Lorraine was the very first institution to reuse the National Monitor in order to create a new version at the scale of one university in 2020. Since its release, the Lorraine Open Science Monitor has been reused by many other institutions. In 2022, the French Open Science Monitor further evolved, enabling new insights on open science. The Lorraine Open Science Monitor has also evolved since it began. This paper details how the initial code for the Lorraine Open Science Monitor was developed and disseminated. It then outlines plans for development in the next few years.
Watch VIDEO. Since 2018, the French Open Science Monitor (BSO) has assessed the effectiveness of the national public policy in open science. This steering tool, developed by the French Ministry of Higher Education and Research, the University of Lorraine and Inria, measures the evolution of open science in France using reliable, open and controlled data updated every year. The result is a website presenting different dashboards, tracking for example the ratio of open access scientific publications by year, discipline or publisher. Since its last release in March 2023, the BSO also tracks the production and openness of research datasets and software mentioned in scientific publications on a national scale. To ensure a realistic coverage, our platform relies on large-scale open source Deep Learning techniques applied to the full texts of publications with at least one co-author with a French affiliation. DataStet identifies every mention of datasets in scholarly publications, including implicit mentions of datasets and explicitly named datasets. SoftCite recognizes any software mentions in scientific publications, using as training data the Softcite Dataset. Dataset and software mentions are then characterized automatically as used, created and shared by the research work described in the scientific document. These characterizations can be cumulative. Among 1,608,839 publications from our corpus, we were able to analyze 655,954 of them with our tool DataStet. For this subset, we found 6,511,998 mentions of datasets characterized as used, 330,062 mentions characterized as created, and 78,178 mentions characterized as shared. With this methodology, the BSO can offer new indicators about the proportion of French publications mentioning the usage, creation and sharing of data, as well as the proportion of publications in France that include a "Data Availability Statement". Similar indicators are dedicated to code and software. In addition, these indicators are further broken down into disciplines, publishers and institutions. The project is addressing major technical and organizational challenges: to identify French datasets and software without reference registries as for publications, thanks to artificial intelligence; to produce relevant indicators for the different scientific communities. As an enabling technology to identify research datasets and software, deep learning plays a crucial role. This presentation will be an opportunity to present the latest results of the project, to detail the methodology, and finally to underline the reusability of the project results.