Skip to content

Reconcilable Data Sources

Michael Mior edited this page Aug 26, 2022 · 11 revisions

Listing of Reconcilable Data Sources

With OpenRefine you can perform reconciliation against any web service supporting the Reconciliation Service Api.

The Wikidata reconciliation service is available by default in OpenRefine.

You can alternatively extend your data by calling web services.

Hosted services

These services can directly be added to OpenRefine using their URL by clicking Reconcile -> Add Standard Service.

A more comprehensive list is maintained on Wikidata.

Wikidata

See our dedicated Wiki page: Reconciliation with Wikidata

VIAF

The VIAF® (Virtual International Authority File) combines multiple name authority files into a single OCLC-hosted name authority service. The goal of the service is to lower the cost and increase the utility of library authority files by matching and linking widely-used authority files and making that information available on the Web.

VIAF itself does not provide a reconciliation service but third-party endpoints have been implemented:

VIVO Scientific Collaboration Platform

VIVO is a U.S. national interdisciplinary open source scientific collaboration platform funded by the NIH with development led by Cornell. Their reconciliation service allows reconciling against VIVO entities (faculty members, journals, etc) in any VIVO installation. Extending Google Refine for VIVO

Here are a few examples of such reconciliation endpoints:

OpenCorporates

171 million corporate entities (as of Jul. 2019) available for reconciliation through their service.

Taxonomic Databases

Taxonomic databases (GBIF, NCBI,Global Names Index, uBio, WoRMS, as documented here.

Taxonomic names from IPNI via IPNI Names Reconciliation Service.
Source code

Taxonomic names from EOL via the EOL Names Reconciliation Service. Described here.

Organized Crime and Corruption Reporting Project

OCCRP provides a public reconciliation API endpoint which allows reconciliation of data against a comprehensive list of sanctioned persons and companies, politically exposed persons, and other persons of journalistic interest. The service is intended as a first-level "check for interesting entries" for government or private data.

Nomisma

Nomisma provides data about numismatics:

Ordnance Survey

Ordnance Survey is a national cartographic institution in the UK, which provides reconciliation endpoints for various datasets.

FundRef

The FundRef Reconciliation Service is designed to help publishers (or anybody) more easily clean-up their funder data and map it to the FundRef Registry. It was built on FundRef Metadata Search.

Integrated Authority File (GND) via lobid-gnd

The Integrated Authority File (GND) contains more than 8 Million authority records. It is used for cataloging in libraries as well as in archives, museums and other contexts.

The GND contains authority records for persons, corporate bodies, congresses, places, subject headings and works. It is maintained cooperatively by the German National Library (DNB), German-speaking library networks, the German Union Catalogue of Serials (ZDB) and many other institutions.

lobid-gnd provides a search interface for exploring GND, an integration in OpenRefine, and a general web API based on JSON-LD to enable use of the data in different contexts. The data is based on the RDF version of the GND (updated daily) and EntityFacts (updated quarterly).

Reconciliation endpoint with documentation: http://lobid.org/gnd/reconcile/

Local services

You can run software alongside OpenRefine to provide other reconciliation services, using data stored in various formats.

Reconcile-csv

Reconcile-csv is a reconciliation service for OpenRefine running from a CSV file. It uses fuzzy matching to match entries in one dataset to entries in another dataset, helping to introduce unique IDs into the system - so they can be used to join your data painlessly.

csv-reconcile

Similar to reconcile-csv the csv-reconcile package provides reconciliation services for local CSV files. It is written in Python and offers extensive configuration options.

SPARQL endpoints

The RDF Extension by DERI at NUI Galway includes reconciliation against any SPARQL endpoint or RDF dump file and publishing of the results in RDF. See the documentation for details.

For instance you can use this method to reconcile against the Library of Congress Subject Headings (LCSH), as described by the Free Your Metadata group.

conciliator

conciliator is a Java framework for creating OpenRefine reconciliation services. It currently offers out of the box support for VIAF, ORCID, Open Library, and any Apache Solr collection. Run your own service, or use the public server at http://refine.codefork.com

JournalTOCs

Use JournalTOCs API to create your own cool web applications that integrate content from freely available journal TOCs. Most of JournalTOCs API calls are free and don't require any registration process.

FAST (Faceted Application of Subject Terminology)

FAST is derived from the Library of Congress Subject Headings (LCSH), one of the library domain’s most widely-used subject terminology schemas. The development of FAST has been a collaboration of OCLC Research and the Library of Congress. Work on FAST began in late 1998.

Nomenklatura

Nomenklatura is a simple service that makes it easy to maintain a canonical list of entities such as persons, companies or event streets and to match messy input, such as their names against that canonical list – for example, matching Acme Widgets, Acme Widgets Inc and Acme Widgets Incorporated to the canonical “Acme Widgets”.

With Nomenklatura its a matters of minutes to set up your own set of master data to match against and it provides a simple user interface and API which you can then use do matching (the API is compatible with Open Refine’s reconciliation function).

The project was initially developed at Open Knowledge Labs. After being abandoned, it found a new home with UNICEF, thereby becoming a neat parallel of their work in the real world.

Deprecated or defunct

Freebase

The Freebase Reconciliation Service has been deprecated in June 2015 and was shut down with the rest of Freebase later on.

http://reconcile.freebaseapps.com/

https://developers.google.com/freebase/v1/reconciliation-overview?hl=en

Talis Kasabi

The Kasabi reconciliation services used to provide reconciliation against any database published on the Kasabi platform. Former documentation. They suggest some alternatives

Wish List

The following are data sources that could provide useful reconciling within OpenRefine. If you would like to help with coding a reconciling extension for any, please contact our mailing list. We would love to see some of these happen!

  • Historical Newspapers
    • Library of Congress' Chronicling America provides JSON, RDF, XML & Linked Data with an easy to use API.
  • Chemical Identifier Resolver
    • Over 96 million chemical structures hosted by NCI/NIH, provides names, conversions, & various formats even XML output.
  • Global Research Identifier Database (GRID)
    • Catalog of the world's research organisations, provides names, geographic data, id links to other databases, and inter-organizational relationships.
  • OpenStreetMap - collaborative map of the world. The reconciliation service could match against nodes, ways and relations. The data extension API could return coordinates and other attributes, so this would provide a way of doing geocoding. The extension could be built on top of existing services (such as Nominatim) and/or using its own index.
Clone this wiki locally