Skip to content

Prepare the 'GND' authority records, to expand them in search applications like VuFind

License

Notifications You must be signed in to change notification settings

HeBIS-VZ/GndAuthorityRecords

Repository files navigation

Motivtion

A common demand on search applications for libraries is, to support synonymes from authority records. (The request "author:Blair, Eric" should find also the better known pseudonym "George Orwell")

In traditional a OPAC based on a SQL database, this may solved generically with a join. Modern bibliographic search systems are mostly based on full text retrieval systems like Lucene/SOLR/ElasticSearch/... Meanwhile some of this back ends may emulate a 'join', they are still key-value stores. For this it will be in most cases better, to expand the authority records external.

Expanding authority records while indexing vs. while searching

  • Expanding the synonymes while searching is a straight forward strategy, but it is hard to handle complex synonymes like "big apple" to "new york city". Also it may limit the response time of the system.
  • Expanding the synonymes while searching is not that flexible, but at index time the kind of the authority record (topic term, personal name, ...) is known. So it is easy to handle complex synonyms.

Static file vs. service for synonymes

  • A static file is easy to handle, but for a great collection of authority records may grow to a size of some Gb. This doesn't matte for a complete build of the index, but loading such a big file for every update of a bibliographic record is inefficient.
  • A background service is slight more complex, but does not slow down the startup of the index or the update. On the other hand, a service may increase the time needed to build a new index. This disadvantage can be avoided with a cache.

Description

This project contains a complete service to expand the 'GND'. (Authority records provided by the German National Library)

The service has three Parts

  • Code to parse the authority records (provided in MarcXML) and load them into a simple Solr index.
  • A minimal configuration for the Solr index
  • Exemplary code to integrate the preprocessed synonymes into the own indexing process. e.g. SolrMarc

Status

The main skeleton is quite stable but the processing of the data is in progress

Initial data

The offline package of the GND is seperated in disjunkt files

  • T_umlenk_loesch1701.mrc.xml - Deletions and redirections (todo)
  • Tbgesamt1701gnd.mrc.xml - Organisations
  • Tfgesamt1701gnd.mrc.xml - Meetings
  • Tggesamt1701gnd.mrc.xml - Geographic
  • Tngesamt1701gnd.mrc.xml - Personal names (non individualized)
  • Tpgesamt1701gnd.mrc.xml - Personal Names (individualized)
  • Tsgesamt1701gnd.mrc.xml - Topic Terms
  • Tugesamt1701gnd.mrc.xml - Work/Title

Online update

Changes in the GND are available via OAI

  • OaiUpdates - All kind (todo)

Local authority records

  • Tw: Libraries (todo)
  • Tk: RVK Notations (todo)
  • Tr: other (todo)

Notes

  1. The code and the config for Solr contains some optional features, beside the synonyms
  2. The approach can easy extended for authority records from additional/other sources
  3. The source contains a URL to a local installation of Solr. This resource is not public available.

JavaDoc

You can find the precompiled javadoc below doc

Compatibility

The code uses features of Java8 and needs libraries from following projects:

About

Prepare the 'GND' authority records, to expand them in search applications like VuFind

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages