Creating a GO FAIR Implementation Network for Academic Publishers

29 January 2021 | Carmel McNamara, IOS Press, Amsterdam, NL
The implementation of guiding principles to make data findable, accessible, interoperable, and reusable (FAIR) relies on public endorsement and cooperation. Researchers and the wider data communities are advancing open science culture through Implementation Networks, which are core drivers of the GO FAIR initiative as vehicles for advancing the global discussion about data stewardship. In an article published this month in Information Services & Use, Jan Velterop and Erik Schultes calls for the formation of such a new network for academic publishers. Here, we provide extracts from the article, which introduces the manifesto and objectives for the proposed new network.

Tags:

FAIR

data

digital

IOS Press content

29 January 2021

by Carmel McNamara, IOS Press, Amsterdam, NL

The implementation of guiding principles to make data findable, accessible, interoperable, and reusable (FAIR) relies on public endorsement and cooperation. Researchers and the wider data communities are advancing open science culture through Implementation Networks, which are core drivers of the GO FAIR initiative as vehicles for advancing the global discussion about data stewardship. In an article published this month in Information Services & Use, Jan Velterop and Erik Schultes calls for the formation of such a new network for academic publishers. Here, we provide extracts from the article, which introduces the manifesto and objectives for the proposed new network.

IOS Press has a duty to ensure that the content published in its journals and books is accessible and discoverable. We do that by ensuring the content is machine readable and metadata is publicly available (via our linked data portal LD Connect), in line with FAIR principles, which were first touched upon on this site in an earlier blog post (see here).

It is of interest that the most-viewed article from recent years to be published in Information Services & Use (ISU) focuses on the FAIR movement. In that 2017 article, Barend Mons et al. revisited the original FAIR guiding principles to clarify some interpretations of them as support was gaining ground [1]. Since then, GO FAIR groups (i.e. Implementation Networks) have sprung up around different conventions and best practices based on specific disciplines. The new ISU article, published at the start of January 2021, now calls for academic publishers to get involved in formulating protocols and standards to make research material machine-readable (and therefore FAIR), and outlines a method that will help to implement the FAIR principles in a consistent way [2]. The article is already proving popular and is the most-viewed ISU paper via the IOS Press content platform this month. Read extracts from it below.

Article title: An Academic Publishers’ GO FAIR Implementation Network (APIN) (published in Information Services & Use)

An Academic Publishers’ GO FAIR Implementation Network (APIN)

by Jan Velterop and Erik Schultes
Information Services & Use, Vol. 40, Iss. 4, pp. 333–341

Communication of scientific research results is one of the pillars on which scientific progress stands. Traditionally, the published (narrative) literature was the main material from which to construct this pillar. It used to be so that data were supplementary to published articles, but in the last decades this relation has undergone an inversion, where these articles have increasingly become supplementary to the core research results, the data, as they describe the latter’s interpretation, provenance, and significance. Under such an inversion, we might start to refer to “data publication”, with “supplementary article”. An article should be seen to belong to the set of "rich provenance metadata" (although itself not machine readable, but enabling humans to determine actual “reusability” and “fitness for purpose”.

Furthermore, data repositories are playing an increasingly important role in the scientific communication and discovery process, as data availability and reusability becomes a crucial element in science’s efficiency and efficacy. The old method, which relied on “data are available upon request” (or even on “all data are available from the corresponding author upon reasonable request”) is not fit for purpose any longer, even when, or rather, if, data were easily available and usable that way. More and more science is also reliant on data sets that are either too large and complex to simply being made available by request, reasonable or not, from the authors following idiosyncratic methods of exchange.

In many disciplines, such as medical and psychological sciences, data are often too privacy-sensitive to be “shared” in the classical sense. They need to be anonymized, and even then, levels of trust necessary to guarantee privacy cannot always be achieved. One solution is that data are not shared by means of making them available in the traditional way, but instead, by means of making it possible to analyze data by “visiting” them in their privacy-secure silos. A good example is the Personal Health Train: The key concept here is to bring algorithms to the data where they happen to be, rather than bringing all data to a central place. The algorithms just “visit” the data, as it were, which means that no data need to be duplicated or downloaded to another location. The Personal Health Train is designed to give controlled access to heterogeneous data sources, while ensuring privacy protection and maximum engagement of individual patients, citizens, and other entities with a strong interest in the protection of data privacy. The data train concept is a generic one and is clear that it can also be applied in areas other than health sciences. For the concept to work, data need to be FAIR. FAIR has a distinctive and precise meaning in this context. It stands for data and services that are Findable, Accessible, Interoperable, and Reusable by machines.

Metadata
In the new models supporting FAIR data publications, metadata is key. The most important role of publishers and preprint platforms is to ensure that detailed, domain-specific, and machine-actionable metadata are provided with all publications, including text, datasets, tabular material, and images. In other words, with all ‘research objects’ that they publish. These metadata need to comply with standards that are relevant for the domains and disciplines involved. And they need to be properly applied drawing on the controlled vocabularies created by or in close consultation with the practicing domain experts. Some of the metadata components required are already (almost) universally used, such as digital object identifiers (DOIs), although not in all cases in a way that makes them useful for machine-reading. Often enough, a DOI links to a landing page or an abstract, designed to be seen by humans, where the human eye and finger is relied upon to perform the final link through to the object itself, e.g. a PDF. Clearly, that approach is not satisfactory for machine analysis of larger amounts of data and information.

Given the information and data-density of the text in articles, it makes sense to convert assertions into semantically-rich, machine-actionable objects as well, according to the Resource Description Framework (RDF), a family of the World Wide Web Consortium (W3C) specifications. Where possible, assertions that leverage domain-specific vocabularies and RDF markup could be presented as semantic triples along with their appropriate metadata (“nanopublications”). If all is well, the assertions and claims mentioned in an article’s abstract should capture the gist of the article (at least to the author’s satisfaction), but also tabular data could be treated in this way. The result of the above should be a FAIR Digital Object (FDO), a digital “twin” of the originally submitted research object. It is not suggested that the onus of doing all this should be exclusively on the shoulders of publishers, but instead, should be a collaboration of publishers and (prospective) authors and increasingly, their data stewards.

Publishers would provide some of the appropriate tools, and where necessary train (or constrain), authors to indicate what the significant assertions and claims in their articles are. Examples of useful RDF metadata tools that make FAIR publication easy (or at least easier) are available from the Center for Expanded Data Annotation and Retrieval (CEDAR) and the BioPortal repository with more than 800 controlled vocabularies, many from special domains. Increasingly CEDAR has been deployed as the primary tooling for the so-called Metadata for Machines workshops, where FAIR metadata experts are teamed up with domain-experts to work together to create appropriate FAIR metadata artefacts.

What does all this mean for academic publishers?
Much of what is proposed should be possible to incorporate into current processes and workflows. Publishers already add metadata to whatever they publish. Extending this metadata, and increasing its granularity is unlikely to be a fundamental change for most publishers. It speaks for itself that authors will need to be involved in determining what are the ‘significant assertions’ in the text of their articles, and what is the meaning of any terms used. If an author uses the term NLP, for instance, it should not be for the publisher to decide if it means “neuro-linguistic programming” or “natural language processing.” There are various technologies available that would enable the realization of simple and efficient tools for authors to add semantically unequivocal assertions to their manuscripts, in the way that they currently add keywords (although the meaning of those is often enough quite ambiguous, which must be avoided for key assertions).

The publishing process should not differ much from the one currently used, as Figure 1 below shows. The main thing is to create (arrange for the creation of) a FAIR “twin” of the original research object as submitted.

Figure 1: The flow of research objects through the publishing process (source: ISU article; credit: Jan Velterop and Erik Schultes)

Proposed manifesto
For an Academic Publishers’ Implementation Network (APIN) to be recognized as a GO FAIR Implementation Network, a “manifesto” is required. Such a manifesto is meant to articulate critical issues that are of generic importance to the objectives of FAIR, and on which the APIN partners have reached consensus. The proposed manifesto is as follows (see Box 1), and at this stage is subject to agreement, modification, and acceptance by the initial APIN partners.

The APIN is envisaged to consist of academic publishing outfits of any stripe, be they commercial ones, society publishers, or preprint platforms. Its main purpose is to develop and promote best practices to “publish for machines” in alignment with the FAIR principles. These best practices would apply to traditional scholarly communication, but with proper addition of machine-readable renditions and components.

All APIN participants commit to comply with the Rules of Engagement of GO FAIR Implementation Networks “Rules of Engagement,” essentially committing to the FAIR Principles and embracing zero tolerance for vendor lock-in formally or otherwise.

Initial Primary Tasks
The next primary tasks will be to coordinate with the partners in the APIN a detailed plan of execution and a roadmap for further development as soon as the process of becoming a GO FAIR Implementation Network has started. Also, to develop rules for authors to stimulate proper FAIR data publishing in trusted repositories with a FAIR data point to support maximum machine Findability, Accessibility and Interoperability for Reuse.

Box 1: Proposed APIN Manifesto

Each APIN partner will:

Comply with the FAIR Data Principles: Abide by the Governance Principles: APIN partners will formally acknowledge and endorse the general governance principles of the GO FAIR initiative.
Accept to be stakeholder-governed: The GO FAIR implementation approach for the IFDS is stakeholder-governed. A self-coordinating, board-governed organization drawn from the stakeholder Implementation Network community creates trust that the organization will take decisions driven by community consensus, considering different interests.
Accept non-discriminatory membership: When willing to sign the Rules of Engagement, any stakeholder may express an interest in and should be welcome to join GO FAIR.
Conduct transparent operations: Achieving trust in the selection of representatives in governance groups will be best achieved through transparent processes and operations in general (within the constraints of privacy laws).

Targeted Objectives of APIN:

Provide for each narrative article (or element thereof) a machine-resolvable globally unique, persistent and resolvable identifier for semantic artefacts.
Provide the same for each major concept in the article (as an extension of the classical “keywords”).
Request from, and assist, authors of articles to indicate, and where possible format as machine-readable semantic triples, the main assertions and claims in their original contributions.
Determine, with the guidance of GO-FAIR, which are appropriate tools for publishers to offer to authors to enable them to do what is requested of them in the point above.
Request from authors to deposit supplementary material, including large datasets, where possible in machine-readable formats, assist authors who have difficulties doing that themselves, and, where providing the data in the dataset in machine-readable format proves not (yet) possible, provide at least the dataset as such with machine readable metadata.

The above includes short extracts from the article. Citations should be to the original article, which can be accessed in full here.

References

1. Barend Mons et al., “Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud”, Information Services & Use, Vol. 37, Iss. 1, 49–56, link: content.iospress.com/articles/information-services-and-use/isu824 (last accessed: 29 January 2021).
2. Jan Velterop and Erik Schultes, “An Academic Publishers’ GO FAIR Implementation Network (APIN),” Information Services & Use, Vol. 40, Iss. 4, 333–341, link: content.iospress.com/articles/information-services-and-use/isu200102 (last accessed: 29 January 2021).

Article Abstract:

An Academic Publishers’ GO FAIR Implementation Network (APIN)

by Jan Velterop and Erik Schultes

Presented here is a proposal for the academic publishing industry to get actively involved in the formulation of protocols and standards that make published scientific research material machine-readable in order to facilitate data to be findable, accessible, interoperable, and re-usable (FAIR). Given the importance of traditional journal publications in scholarly communication worldwide, active involvement of academic publishers in advancing the more routine creation and reuse of FAIR data is highly desired.