What powers LD Connect?
As part of our continued commitment to linked data, a new section has been incorporated into the IOS Press website that provides background on this initiative and explains the capabilities of LD Connect, which we hope will help you understand the benefits and what you can gain from using it. You can explore the simple search and gain insights as highlighted later in this post.
First, we look at what powers our linked data portal. At its core, LD Connect is a database of all IOS Press metadata, i.e., information about a given paper (journal article or book chapter) – such as title, publication date, abstract, keywords, authors and their affiliations, and more.
It is a powerful database with all the information stored in a structured way. Underlying the database, we use a custom vocabulary and web standards while describing our data in order to make that data even more discoverable, accessible, linkable, and interoperable with other datasets. Affiliations are geocoded and we are working towards authors, as well as affiliations, being disambiguated using our co-reference resolution script. With the help of machine learning techniques, the data conversion pipeline keeps on improving as more data are added.
This all simply means that it allows us to ask the database questions. Things like:
- How many papers were published in the Statistical Journal of the IAOS in 2019?
- What is the proportion of open access papers published in the Journal of Alzheimer's Disease over time?
- How many authors who published papers on biology are from Canada?
LD Connect’s technology facilitates the communication between the human- and machine-readable data. These questions are called “queries,” and we need to input a query in a language the database can understand. For most databases, it is something called SQL (Standard Query Language). Our database uses a language called SPARQL (Simple Protocol and RDF Query Language) to retrieve information. Here is the SPARQL version of the first query (Figure 1).
Figure 1: Options for an expert level semantic search (click visual to input searches via the LD Connect website)
This can be explained as follows:
- We tell LD Connect what we want to see in the output – in this case the paper (i.e., the article identifier), the title, and the journal in which it is published
- We then explain where to find that information, i.e., that the paper is part of an issue, the issue part of a volume, and finally the volume is part of a journal
- Then, we ask for the results to be filtered to the specific journal using its three-digit identifier of “sji”
The result is a list like the one shown in Figure 2.
Figure 2: Example of the behind-the-scenes steps showing the results of a semantic search