Skip to content
James Baker edited this page May 9, 2017 · 4 revisions

The following annotators are included in the Baleen 2.2 release, categorised below by the technique they use to do entity extraction.

Full documentation of all Baleen components is available in the Baleen Javadoc.

Cleaners

  • AddGenderToPerson
  • AddTitleToPerson
  • Blacklist
  • CleanPunctuation
  • CleanTemporal
  • CollapseLocations
  • CurrencyDetection
  • EntityInitials
  • ExpandLocationToDescription
  • MergeAdjacent
  • MergeAdjacentQuantities
  • MergeNationalityIntoEntity
  • NaiveMergeRelations
  • NormalizeOSGB
  • NormalizeTemporal
  • NormalizeWhitespace
  • ReferentToEntity
  • RelationTypeFilter
  • RemoveLowConfidenceEntities
  • RemoveNestedEntities
  • RemoveNestedLocations
  • RemoveOverlappingEntities
  • SplitBrackets
  • Surname

Coreference

Some of these are currently appear as cleaners in the code.

  • CorefBrackets
  • CorefCapitalisationAndApostrophe
  • SieveCoreference

Gazetteer

  • Country
  • File
  • Mongo
  • MongoRegex
  • MongoStemming

Grammatical

  • NPAtCoordinate
  • NPElement
  • NPLocation
  • NPOrganisation
  • NPTitleEntity
  • QuantityNPEntity
  • TOLocationEntity

Interactions and Patterns

  • AssignTypeToInteraction
  • PatternExtractor
  • RemoveInteractionInEntities

Language

  • MaltParser
  • OpenNLP
  • OpenNLPParser
  • WordNetLemmatizer

Miscellaneous

  • AddSourceToMetadata
  • CommonKeywords
  • DocumentTypeByFilename
  • DocumentTypeByLocation
  • DocumentTypeByParameter
  • FullDocument
  • GenericMilitaryPlatform
  • GenericVehicle
  • GenericWeapon
  • MentionedAgain
  • NationalityToLocation
  • OrganisationPersonRole
  • People
  • Pronouns
  • RakeKeywords

Regular Expressions

  • Area
  • BritishArmyUnits
  • Callsign
  • CasRegistryNumber
  • Custom
  • Date
  • DateTime
  • Distance
  • DocumentNumber
  • Dtg
  • Email
  • FlightNumber
  • Frequency
  • Hms
  • IpV4
  • LatLon
  • Mgrs
  • Money
  • Nationality
  • Osgb
  • Postcode
  • RelativeDate
  • SocialMediaUsername
  • TaskForce
  • Telephone
  • Time
  • TimeQuantity
  • USTelephone
  • UnqualifiedDate
  • Url
  • Volume
  • Weight

Relations

  • NPVNP
  • SimpleInteraction
  • UbmreConstituent
  • UbmreDependency

Statistical

  • DocumentLanguage
  • DocumentType
  • OpenNLP

Structural

  • StructuralEntity
  • StructuralRelation
  • TableEntity
  • TableRelation

Templates

  • TemplateAnnotator
  • TemplateFieldDefinitionAnnotator
  • TemplateFieldJoiningAnnotator
  • TemplateFieldToEntityAnnotator
  • TemplateRecordDefinitionAnnotator
  • TemplateValidator