Ricgraph - Research in context graph

What is Ricgraph?

Ricgraph, also known as Research in context graph, enables the exploration of researchers, teams, their results, collaborations, skills, projects, and the relations between these items.

Ricgraph can store many types of items into a single graph. These items can be obtained from various systems and from multiple organizations. Ricgraph facilitates reasoning about these items because it infers new relations between items, relations that are not present in any of the separate source systems. It is flexible and extensible, and can be adapted to new application areas.

Throughout this documentation, we illustrate how Ricgraph works by applying it to the application area research information.

Motivation

Ricgraph, also known as Research in context graph, is software that is about relations between items. These items can be collected from various source systems and from multiple organizations. We explain how Ricgraph works by applying it to the application area research information. We show the insights that can be obtained by combining information from various source systems, insight arising from new relations that are not present in each separate source system.

Research information is about anything related to research: research results, the persons in a research team, their collaborations, their skills, projects in which they have participated, as well as the relations between these entities. Examples of research results are publications, data sets, and software.

Example use cases from the application area research information are:

As a journalist, I want to find researchers with a certain skill and their publications, so that I can interview them for a newspaper article.
As a librarian, I want to enrich my local research information system with research results that are in other systems but not in ours, so that we have a more complete view of research at our university.
As a researcher, I want to find researchers from other universities that have co-authored publications written by the co-authors of my own publications, so that I can read their publications to find out if we share common research interests.

These use cases use different types of information (called items): researchers, skills, publications, etc. Most often, these types of information are not stored in one system, so the use cases may be difficult or time-consuming to answer. However, by using Ricgraph, these use cases (and many others) are easy to answer, as will be explained throughout this documentation.

Although this documentation illustrates Ricgraph in the application area research information, the principle “relations between items from various source systems” is general, so Ricgraph can be used in other application areas.

Main contributions of Ricgraph

Ricgraph can store many types of items in a single graph.
Ricgraph harvests multiple source systems into a single graph.
Ricgraph Explorer is the exploration tool for Ricgraph.
Ricgraph facilitates reasoning about items because it infers new relations between items.
Ricgraph can be tailored for an application area.

How to use Ricgraph (very short)

Install and configure Ricgraph.
Start harvesting data, see Ricgraph harvest scripts.
Use Ricgraph Explorer, the exploration tool for Ricgraph.
Use the Ricgraph REST API, the REST API for Ricgraph.
Optional: modify code to fit Ricgraph to your specific use case.

For more details, read the remainder of this documentation.

Why Ricgraph?

Ricgraph can answer questions like:

Which researcher has contributed to which publication, dataset, software package, project, etc.?
Given e.g. a dataset, software package, or project, who has contributed to it?
What identifiers does a researcher have (e.g. ORCID, ISNI, organization employee ID, email address)?
What skills does a researcher have?
Show a network of researchers who have worked together?
Which organizations have worked together?

Also, more elaborate information can be found using Ricgraph and Ricgraph Explorer, the exploration tool for Ricgraph:

You can find information about persons or their results in a (child) organization (unit, department, faculty, university). For example, you can find out what data sets or software are produced in your faculty. Or the skills of all persons in your department. Of course this is only possible in case you have harvested them.
You can find out with whom a person shares research output types. For example, you can find out with whom someone shares software or data sets.
You can get tables showing how you can enrich a source system based on other systems you have harvested. For example, suppose you have harvested both Pure and OpenAlex, using this feature you can find out which publications in OpenAlex are not in Pure. You might want to add those to Pure.
You can get a table that shows the overlap in harvests from different source systems. For example, after a query to show all ORCID nodes, the table summarizes the number of ORCID nodes which were only found in one source, and which were found in multiple sources. Another table gives a detailed overview how many nodes originate from which different source systems. Then, you can drill down by clicking on a number in one of these two tables to find the nodes corresponding to that number.

With Ricgraph, you can get metadata from objects from any source system you’d like. You run the harvest script for that system, and data will be imported in Ricgraph and will be combined automatically with data which is already there. Ricgraph provides harvest scripts for the systems mentioned above. Scripts for other sources can be written easily.

In the remainder of this text, Ricgraph is described in the use case of showing people, organizations and research outputs in relation to each other in a university context.

Example use cases in Ricgraph

Use case 1, as a journalist...

As a journalist, I want to find researchers with a certain skill S and their publications, so that I can interview them for a newspaper article. Example skills can be: climate change or stem cells.

Use case 2, as a librarian...

As a librarian, I want to enrich my local research information system with research results from person A that are in other systems (in orange, RIS2) but not in ours (in green, RIS1), so that we have a more complete view of research at our university.

Use case 3, as a researcher...

As a researcher A, I want to find researchers from other universities that have co-authored publications written by the co-authors of my own publications, so that I can read their publications to find out if we share common research interests.

Examples

See the figures below for example graphs that show how Ricgraph works. Click a figure to enlarge.

one person with several research outputs	symbols for type of object	colors for source system

This figure shows one person A using a person-root node, a node which "represents" a person as it is called in Ricgraph. This person has contributed to three articles, two data sets and one software package. Two articles and one data set are from the Research Information System Pure (their color is green), one data set is from the data repository Yoda (in orange), one article is from OpenAlex (in purple), and the software package is from the Research Software Directory (in blue).

several persons with several research outputs	one person with several identifiers and research outputs

The left part of this figure shows several persons having several research outputs (the symbols) and how these are related (i.e. which person contributed to which research output). It also shows from which source system these research outputs have originated (using different colors). The right part shows one person having several identifiers and several research outputs. This person has two different ORCIDs, one ISNI, one SCOPUS_AUTHOR_ID, and two FULL_NAMEs (which differ in spelling). These identifiers have also been obtained from different source systems, as their color indicates.

More examples can be found in Ricgraph details.

Ricgraph in bullet points

The philosophy of Ricgraph is that it stores metadata, not the objects the metadata refer to. To access an object, a node has a link to that object in the system it was obtained from.
We have chosen a graph as a datastructure, since it is a logical and efficient method to access objects which are close to objects they have a relation to. For example, starting with a person, its research outputs are only one step away by following one edge, and other contributors to that research output are again one step (edge) away.
Ricgraph can be used to store, manipulate and read metadata of any object that has a relation to another object, as long as every object can be "represented" by at least a name and a value. In Ricgraph, one node represents one object, and an edge represents the relation between two objects.
Ricgraph and Ricgraph Explorer are written in Python. You can use two different graph database backends:
- Neo4j (either Neo4j Desktop or Neo4j Community Edition);
- Memgraph.
Metadata of an object are stored as "properties" in a node, i.e. as information associated with a node. For example, a node may store two properties, name = PET and value = cat. Another node may store name = FULL_NAME and value = John Doe. Then the edge between those two nodes means that the person with FULL_NAME John Doe has a PET which is a cat. Ricgraph can store any number of properties in a node.
The objective of Ricgraph is to get metadata from objects from a source system in a process called "harvesting". That means that e.g. persons and publications can be harvested from one system, data sets from another system, and software from a third system. Everything found will be combined into one graph.
Ricgraph can harvest from many sources, and you can write your own harvesting scripts. Example scripts are included to harvest from the OpenAlex, the Research Information System Pure, the data repository Yoda, the Research Software Directory, and for the Utrecht University staff pages.
Ricgraph can be used as an ID resolver. It can, given an identifier of a person, easily find other identifiers of that person. When new identifiers are found when harvesting from new systems, they will be added automatically.
Ricgraph can check the consistency of information harvested. For example, ORCIDs and ISNIs are supposed to refer to one person, so every node representing such an identifier should have only one edge. This can be checked easily. An example script is included.
Ricgraph can enrich information in its own graph by using information from other systems. For example, if a person has an ORCID, but not a Scopus Author ID, OpenAlex can be used to find the missing Scopus Author ID. An example script is included.
Ricgraph can enrich a source system based on information that is present in one source system, but not in another source system. See use case 2 above.

Next steps

Further information about Ricgraph

For a gentle introduction in Ricgraph, read the reference publication: Rik D.T. Janssen (2024). Ricgraph: A flexible and extensible graph to explore research in context from various systems. SoftwareX, 26(101736). https://doi.org/10.1016/j.softx.2024.101736.
Read more about publications, presentations, use and mentions of Ricgraph.
Look at the videos we have made to demonstrate Ricgraph.
Read more about Ricgraph details, such as example graphs, person identifiers and the person-root node.
You might want to compare Ricgraph to other systems.

Steps to take if you would like to install Ricgraph and harvest data

Install and configure Ricgraph.
Start harvesting data, see Ricgraph harvest scripts, e.g. by doing a harvest for Utrecht University data sets and software. You will observe that the information from two sources is neatly combined into one graph.
Unfortunately, there is a bug, see known bugs. This bug may occur if you start a harvest script, and as first step in the script you want to empty Ricgraph. In that case, a Python error might occur while emptying Ricgraph. Follow the link to read more and find out how to repair that.

Steps to take if you would like to use Ricgraph

First, install Ricgraph (see above).
Use Ricgraph Explorer, the exploration tool for Ricgraph.
Use the Ricgraph REST API, the REST API for Ricgraph.
Alternatively, you might want to read Query and visualize Ricgraph.

Read this in case you would like to extend Ricgraph

Start writing scripts, see Ricgraph script writing.
Of course, there is future work to do. Please let me know if you'd like to help.

Name		Name	Last commit message	Last commit date
Latest commit History 408 Commits
docs		docs
export_ricgraph_examples		export_ricgraph_examples
find_enrich_ricgraph_examples		find_enrich_ricgraph_examples
harvest_to_ricgraph_examples		harvest_to_ricgraph_examples
neo4j_config		neo4j_config
ricgraph		ricgraph
ricgraph_explorer		ricgraph_explorer
ricgraph_server_config		ricgraph_server_config
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
codemeta-harvest.json		codemeta-harvest.json
requirements.txt		requirements.txt
ricgraph.ini-sample		ricgraph.ini-sample

License

UtrechtUniversity/ricgraph

Folders and files

Latest commit

History

Repository files navigation

Ricgraph - Research in context graph

What is Ricgraph?

Motivation

Main contributions of Ricgraph

How to use Ricgraph (very short)

Read more about Ricgraph

Why Ricgraph?

Example use cases in Ricgraph

Use case 1, as a journalist...

Use case 2, as a librarian...

Use case 3, as a researcher...

Examples

Ricgraph in bullet points

Next steps

Further information about Ricgraph

Steps to take if you would like to install Ricgraph and harvest data

Steps to take if you would like to use Ricgraph

Read this in case you would like to extend Ricgraph

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages