Skip to content

TheScienceMuseum/heritage-connector

Repository files navigation

Heritage Connector

Transforming text into data to extract meaning and make connections. In development.

See also our paper, Heritage connector: A machine learning framework for building linked open data from museum collections, at https://doi.org/10.1002/ail2.23.

A set of tools to:

  • load tabular collection data to a knowledge graph
  • find links between collection entities and Wikidata
  • perform NLP to create more links in the graph (also see hc-nlp)
  • explore and analyse a collection graph ways that aren't possible in existing collections systems

diagram: Relational DB vs Knowledge Graph Collections as tabular data (left) vs knowledge graphs (right)

Further Reading

The main project page is here. We're also writing about our research on the project blog as we develop these tools and methods.

Some blog highlights:

For Developers (TODO: put in docs)

  • Python 3
  • Create a new branch / Pull Request for each new feature / unit of functionality

Installation

We use pipenv for dependency management. You can also install dependencies from requirements.txt and dev dependencies from requirements_dev.txt.

Optional dependencies (for experimental features):

  • torch, dgl, dgl-ke: KG embeddings
  • spacy-nightly: export to spaCy KnowledgeBase for Named Entity Linking

Running tests

Run python -m pytest with optional --cov=heritageconnector for a coverage report.

We use pytest for tests, and all tests are in ./test.

Running

To run web app (in development): python -m heritageconnector.web.app

Citation

Cite as:

Dutia, K, Stack, J. Heritage connector: A machine learning framework for building linked open data from museum collections. Applied AI Letters. 2021;e23. https://doi.org/10.1002/ail2.23