Skip to content

joerg-rade/domox

Repository files navigation

domox

Domain Model Extractor: automatically extract a (first iteration) domain model from requirements. By means of Natural Language Processing (NLP), it will try to identify candidates for:

  • Classes
  • Properties
  • Actions and
  • Associations

Preview

Domox allows the user to:

  • Import documents
  • Analyze requirements therein
  • Extract relations
  • Generate a Domain Model (UML)
  • Flag relations as irrelevant (i.e. exclude them from the Domain Model)
  • View the (plant)UML model and
  • Export it (for subsequent processing by other tools, e.g. Apache Causeway)
  • Allow to trace back from the Domain Model elements to requirements.

Design follows the outline in [1]:

Preview

NLP Processing usually is as follows:

Preview

Glossary

  • Corpus - A list of documents
  • NER - Named Entity Recognition
  • POS - Part of Speech (Taggigng)
  • Sofa - Subject of Analysis
  • Boilerplate
  • Tokenization
  • Sentence Splitting
  • Parsing
  • Co-reference Resolution (synonyms)
  • Lemmatization
  • Stemming
  • Annotation (refrence to text, begin/end)
  • Type System - output
  • AE - Analysis Engine

References

[1] C. Arora, M. Sabetzadeh, L. Briand, and F. Zimmer (2016). Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation. https://people.svv.lu/sabetzadeh/pub/MODELS16.pdf

[2] M. Javed, and Y. Lin (2020). iMER: Iterative Process of Entity Relationship and Business Process Models Extraction from the Requirements.

[3] P. Coad, E. Lefebvre, and J. De Luca (1999). Java Modeling in Color with UML. Enterprise Components and Processes. ISBN 0-13-011510-X

[4] Generating code via XTEXT https://github.com/echebbi/standalone-xtext-example https://blogs.itemis.com/en/building-domain-specific-languages-with-xtext-and-xtend

Excerpt from [1]:

Syntactic parsing (SP) is the key enabling NLP technology. It consists of:

  • Phrase structure parsing
    • noun phrases (NP)
    • verb phrases (VP) / verb (VB)
  • dependency parsing (functional constituents, parse tree)
    • subject
    • object

There are four kinds of relations:

  • (regular) Association
  • Aggregation
  • Generalization
  • Attribute

A Domain Model has:

  • 1..n Concepts
  • 1..n Attributes
  • 1..n Associations

Various

Tutorials

Ideas

Tools

Search Terms

Running with Docker

Kroki/PlantUML

Kroki is used for generating diagrams (UML, TypedDependency) - this can either be the instance the kroki team provides or one you host yourself as Docker image.

StanfordCoreNLP

StanfordCoreNLP can be used either from a jar library or via a service hosted by a Docker image. https://hub.docker.com/r/alborzi/stanford-corenlp

docker pull alborzi/stanford-corenlp