Skip to content

UtrechtUniversity/ricgraph

Repository files navigation

Static Badge Static Badge
GitHub release date GitHub latest release GitHub commits since latest release GitHub last commit
GitHub license Project Status: Active – The project has reached a stable, usable state and is being actively developed. Technology Readiness Level 6/9 - Late Prototype - Technology demonstrated in target setting, end-users adopt it for testing purposes FAIR checklist badge

Ricgraph - Research in context graph

What is Ricgraph?

Ricgraph, also known as Research in context graph, enables the exploration of researchers, teams, their results, collaborations, skills, projects, and the relations between these items.

Ricgraph can store many types of items into a single graph. These items can be obtained from various systems and from multiple organizations. Ricgraph facilitates reasoning about these items because it infers new relations between items, relations that are not present in any of the separate source systems. It is flexible and extensible, and can be adapted to new application areas.

Throughout this documentation, we illustrate how Ricgraph works by applying it to the application area research information.

Motivation

Ricgraph, also known as Research in context graph, is software that is about relations between items. These items can be collected from various source systems and from multiple organizations. We explain how Ricgraph works by applying it to the application area research information. We show the insights that can be obtained by combining information from various source systems, insight arising from new relations that are not present in each separate source system.

Research information is about anything related to research: research results, the persons in a research team, their collaborations, their skills, projects in which they have participated, as well as the relations between these entities. Examples of research results are publications, data sets, and software.

Example use cases from the application area research information are:

  • As a journalist, I want to find researchers with a certain skill and their publications, so that I can interview them for a newspaper article.
  • As a librarian, I want to enrich my local research information system with research results that are in other systems but not in ours, so that we have a more complete view of research at our university.
  • As a researcher, I want to find researchers from other universities that have co-authored publications written by the co-authors of my own publications, so that I can read their publications to find out if we share common research interests.

These use cases use different types of information (called items): researchers, skills, publications, etc. Most often, these types of information are not stored in one system, so the use cases may be difficult or time-consuming to answer. However, by using Ricgraph, these use cases (and many others) are easy to answer, as will be explained throughout this documentation.

Although this documentation illustrates Ricgraph in the application area research information, the principle “relations between items from various source systems” is general, so Ricgraph can be used in other application areas.

Main contributions of Ricgraph

  • Ricgraph can store many types of items in a single graph.
  • Ricgraph harvests multiple source systems into a single graph.
  • Ricgraph Explorer is the exploration tool for Ricgraph.
  • Ricgraph facilitates reasoning about items because it infers new relations between items.
  • Ricgraph can be tailored for an application area.

How to use Ricgraph (very short)

For more details, read the remainder of this documentation.

Read more about Ricgraph

For a gentle introduction in Ricgraph, read the reference publication: Rik D.T. Janssen (2024). Ricgraph: A flexible and extensible graph to explore research in context from various systems. SoftwareX, 26(101736). https://doi.org/10.1016/j.softx.2024.101736.

More details can be found in section Example use cases in Ricgraph, where you can read how the use cases above work out using Ricgraph. To learn more about Ricgraph, read why Ricgraph has been developed. This is followed by Ricgraph in bullet points. There is also a section with next steps you might want to take, including further information about Ricgraph, an explanation how to install Ricgraph and harvest data, an explanation how to use Ricgraph, and information about extending Ricgraph. Of course there are videos we have made to demonstrate Ricgraph, and there is an overview of the publications, presentations, use and mentions of Ricgraph.

Why Ricgraph?

Ricgraph can answer questions like:

  • Which researcher has contributed to which publication, dataset, software package, project, etc.?
  • Given e.g. a dataset, software package, or project, who has contributed to it?
  • What identifiers does a researcher have (e.g. ORCID, ISNI, organization employee ID, email address)?
  • What skills does a researcher have?
  • Show a network of researchers who have worked together?
  • Which organizations have worked together?

Also, more elaborate information can be found using Ricgraph and Ricgraph Explorer, the exploration tool for Ricgraph:

  • You can find information about persons or their results in a (child) organization (unit, department, faculty, university). For example, you can find out what data sets or software are produced in your faculty. Or the skills of all persons in your department. Of course this is only possible in case you have harvested them.
  • You can find out with whom a person shares research output types. For example, you can find out with whom someone shares software or data sets.
  • You can get tables showing how you can enrich a source system based on other systems you have harvested. For example, suppose you have harvested both Pure and OpenAlex, using this feature you can find out which publications in OpenAlex are not in Pure. You might want to add those to Pure.
  • You can get a table that shows the overlap in harvests from different source systems. For example, after a query to show all ORCID nodes, the table summarizes the number of ORCID nodes which were only found in one source, and which were found in multiple sources. Another table gives a detailed overview how many nodes originate from which different source systems. Then, you can drill down by clicking on a number in one of these two tables to find the nodes corresponding to that number.

With Ricgraph, you can get metadata from objects from any source system you’d like. You run the harvest script for that system, and data will be imported in Ricgraph and will be combined automatically with data which is already there. Ricgraph provides harvest scripts for the systems mentioned above. Scripts for other sources can be written easily.

In the remainder of this text, Ricgraph is described in the use case of showing people, organizations and research outputs in relation to each other in a university context.

Example use cases in Ricgraph

Use case 1, as a journalist...

As a journalist, I want to find researchers with a certain skill S and their publications, so that I can interview them for a newspaper article. Example skills can be: climate change or stem cells.

Use case 2, as a librarian...

As a librarian, I want to enrich my local research information system with research results from person A that are in other systems (in orange, RIS2) but not in ours (in green, RIS1), so that we have a more complete view of research at our university.

Use case 3, as a researcher...

As a researcher A, I want to find researchers from other universities that have co-authored publications written by the co-authors of my own publications, so that I can read their publications to find out if we share common research interests.

Examples

See the figures below for example graphs that show how Ricgraph works. Click a figure to enlarge.

one person with several research outputs symbols for type of object colors for source system

This figure shows one person A using a person-root node, a node which "represents" a person as it is called in Ricgraph. This person has contributed to three articles, two data sets and one software package. Two articles and one data set are from the Research Information System Pure (their color is green), one data set is from the data repository Yoda (in orange), one article is from OpenAlex (in purple), and the software package is from the Research Software Directory (in blue).

several persons with several research outputs one person with several identifiers and research outputs

The left part of this figure shows several persons having several research outputs (the symbols) and how these are related (i.e. which person contributed to which research output). It also shows from which source system these research outputs have originated (using different colors). The right part shows one person having several identifiers and several research outputs. This person has two different ORCIDs, one ISNI, one SCOPUS_AUTHOR_ID, and two FULL_NAMEs (which differ in spelling). These identifiers have also been obtained from different source systems, as their color indicates.

More examples can be found in Ricgraph details.

Ricgraph in bullet points

  • The philosophy of Ricgraph is that it stores metadata, not the objects the metadata refer to. To access an object, a node has a link to that object in the system it was obtained from.
  • We have chosen a graph as a datastructure, since it is a logical and efficient method to access objects which are close to objects they have a relation to. For example, starting with a person, its research outputs are only one step away by following one edge, and other contributors to that research output are again one step (edge) away.
  • Ricgraph can be used to store, manipulate and read metadata of any object that has a relation to another object, as long as every object can be "represented" by at least a name and a value. In Ricgraph, one node represents one object, and an edge represents the relation between two objects.
  • Ricgraph and Ricgraph Explorer are written in Python. You can use two different graph database backends:
    • Neo4j (either Neo4j Desktop or Neo4j Community Edition);
    • Memgraph.
  • Metadata of an object are stored as "properties" in a node, i.e. as information associated with a node. For example, a node may store two properties, name = PET and value = cat. Another node may store name = FULL_NAME and value = John Doe. Then the edge between those two nodes means that the person with FULL_NAME John Doe has a PET which is a cat. Ricgraph can store any number of properties in a node.
  • The objective of Ricgraph is to get metadata from objects from a source system in a process called "harvesting". That means that e.g. persons and publications can be harvested from one system, data sets from another system, and software from a third system. Everything found will be combined into one graph.
  • Ricgraph can harvest from many sources, and you can write your own harvesting scripts. Example scripts are included to harvest from the OpenAlex, the Research Information System Pure, the data repository Yoda, the Research Software Directory, and for the Utrecht University staff pages.
  • Ricgraph can be used as an ID resolver. It can, given an identifier of a person, easily find other identifiers of that person. When new identifiers are found when harvesting from new systems, they will be added automatically.
  • Ricgraph can check the consistency of information harvested. For example, ORCIDs and ISNIs are supposed to refer to one person, so every node representing such an identifier should have only one edge. This can be checked easily. An example script is included.
  • Ricgraph can enrich information in its own graph by using information from other systems. For example, if a person has an ORCID, but not a Scopus Author ID, OpenAlex can be used to find the missing Scopus Author ID. An example script is included.
  • Ricgraph can enrich a source system based on information that is present in one source system, but not in another source system. See use case 2 above.

Next steps

Further information about Ricgraph

Steps to take if you would like to install Ricgraph and harvest data

  • Install and configure Ricgraph.
  • Start harvesting data, see Ricgraph harvest scripts, e.g. by doing a harvest for Utrecht University data sets and software. You will observe that the information from two sources is neatly combined into one graph.
  • Unfortunately, there is a bug, see known bugs. This bug may occur if you start a harvest script, and as first step in the script you want to empty Ricgraph. In that case, a Python error might occur while emptying Ricgraph. Follow the link to read more and find out how to repair that.

Steps to take if you would like to use Ricgraph

Read this in case you would like to extend Ricgraph