Skip to content

Using Relationship Extraction

James Baker edited this page Apr 27, 2017 · 3 revisions

As of Baleen v2.2, support for some relationship extraction has been added with a new set of annotators. This page will take you through configuring and using some of these annotators. This guide covers the use of the following relationship annotators:

  • relations.SimpleInteraction
  • relations.UbmreConstituent
  • relations.UbmreDependency

For full details of these annotators, and other relationship annotators not covered here, you should refer to the Javadoc. This guide is intended to get you started, but will not cover the intricacies of each annotator.

Training

The annotators covered by this guide require some training prior to use. This should be performed on a representative training set of you data, that can be expected to include examples of all the relationships you would like to extract.

Stage 1: Extraction of Patterns

The first stage is to identify "patterns" within your training set, where a pattern is the text pattern between two annotations (in this case entities) which has been processed to be more meaningful than simply the covered text between them.

To do this, we need to first perform entity extraction on your data (ideally using the same annotators as you intend to use on the actual data). We then run a special pattern annotator and consumer to save these patterns to a Mongo database.

1_pattern_extraction.yml

collectionreader:
  # Read in your training data here
 
annotators:
  # Perform your usual entity extraction here 
  ...
  
  # Pattern Extraction
  - patterns.PatternExtractor
 
consumers:
  # Save patterns to Mongo
  - MongoPatternSaver

Stage 2: Identification of Interactions

Now that we have patterns extracted from our training set, we need to convert them to "interactions". An Interaction is a word that acts as a relationship in a sentence, for instance "saw" in the sentence "John saw the car.".

This is done using a Baleen job, which reads in our Patterns from Mongo and converts them into a CSV file. This CSV should be manually checked after it has been created to remove any unwanted or spurious interactions.

2_interaction_identification.yml

tasks:
- class: interactions.IdentifyInteractions
  filename: output/interactions.csv

Stage 3: Enhancement of Interactions

Following the identification (and manual checking) of interactions, we can optionally run a job to enhance these interactions. This includes complementing the extracted interactions with synonyms. Again, the output CSV file should be manually checked following this stage to remove any unwanted or spurious enhancements.

3_interaction_enhancement.yml

tasks:
- class: interactions.EnhanceInteractions
  input: output/interactions.csv
  output: output/interactions-enhanced.csv

Stage 4: Upload Interactions to Mongo

Finally for the training, we need to upload our CSV of enhanced interactions back to Mongo. This is done through a Baleen job.

4_upload_interactions.yml

tasks:
- class: interactions.UploadInteractionsToMongo
  input: output/interactions-enhanced.csv

Extraction

Once we have completed the training stages, we are able to perform relationship extraction on our full data set. Potentially, the same trained data could be used for a variety of different data sets, but it is recommended that training is performed for each different data set to achieve optimum performance.

Stage 5: Perform Relationship Extraction

Relationship extraction is done with a few specific annotators, which would usually come at the end of the pipeline (after entity cleaners and coreference has been performed). The purpose of these annotators is to:

  1. Extract interactions in the document
  2. Clean up extracted interactions
  3. Perform relationship extraction based on the extracted interactions

5_process_documents.yml

collectionreader:
  # Read in your training data here
 
annotators:
  # Perform your usual entity extraction, cleaning and coreference here 
  ...
  
  # Interaction Extraction
  - class: gazetteer.MongoStemming
    collection: interactions
    type: Interaction
    
  # Clean Interactions
  - interactions.RemoveInteractionInEntities
  - interactions.AssignTypeToInteraction
  
  # Extract Relationships
  - relations.UbmreDependency		# UbmreDependency is used here, but you could also use SimpleInteraction of UbmreConstituent

consumers:
  # Persist to your data store