GitHub - jitinkrishnan/NASA-SE: A Virtual Assistant for NASA's Systems Engineers (AAAI-MAKE '19 '20)

NASA-SE

SEVA: A Virtual Assistant for NASA's Systems Engineers

Publications in this repo:

Part of the Robust Software Engineering-Led Group that won the Digital Transformation Hackathon Award for "Most Potential NASA Impact". September 2020.

Presentations:

AAAI-MAKE 2019 Presentation Slides | Demo Slides | GSFC Short Talk | Poster at the Second AI and Data Science Workshop at JPL.

Citation

@inproceedings{krishnan2019seva,
  title={SEVA: A Systems Engineer's Virtual Assistant},
  author={Krishnan, Jitin and Coronado, Patrick and Reed, Trevor},
  booktitle={AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering},
  year={2019}
}

@inproceedings{krishnan2020ckcr,
  title={Common-Knowledge Concept Recognition for SEVA},
  author={Krishnan, Jitin and Coronado, Patrick and Purohit, Hemant and Rangwala, Huzefa},
  booktitle={AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering},
  year={2020}
}

Note

Datasets used in the project are availale in the datafolder.

pip install -r requirements.txt to install necessary packages if needed.

1. Concept Recognition (CR)

We aim to extract common-knowledge concepts for the systems engineering domain. With the help of a domain expert and text processing methods, we construct a dataset annotated at the word-level by carefully defining a BIO labelling scheme to train a NER-like sequence model to recognize systems engineering concepts.

If you just want to extract concepts right away

Input: A sentence. Output: Tuples of concepts and their corresponding BIO labels.

cd NASA-SE
python -i tag_sentence.py
>>> sentence = "Acceptable Risk is the risk that is understood and agreed to by the program/project,
governing authority, mission directorate, and other customer(s) such that no further specific 
mitigating action is required."
>>> sentence2tags_all(sentence)
[('Acceptable Risk', 'mea'), ('mission', 'seterm'), ('risk', 'mea'), ('program', 'opcon'), 
('project', 'seterm'), ('mission directorate', 'seterm'), ('customer', 'grp')]

Training and Evaluating a custom CR model

Download Uncased BERT model to NASA-SE folder.

Rename the bert folder to bert_models. You can change the BERT vocabulary if needed. There are a few caveats to this. The number of words should remain the same and should include the BERT tokens. BERT recommends replacing the unused words with domain words. However, this may not always guarantee a better performance. The example shown below updates the vocab.txt file with the words from two files: accronyms and definitions. Vocab files we used are in bert_items folder.

cd NASA-SE
python -i seva_dataset_utils.py
>> accr_location = "se_data/acronyms.txt"
>> definition_location = "se_data/definitions.txt"
>> vocab_location = "bert_models/vocab.txt"
update_vocab(vocab_location, accr_location, definition_location)

Here is an example using spaCy.

Datasets

Train and Evaluate

It will take a few minutes to generate the model.

cd NASA-SE
python train_evaluate.py

Once the training is finished, we can extract the tags.

cd NASA-SE
python -i tag_sentence.py
>>> sentence = "Acceptable Risk is the risk that is understood and agreed to by the program/project,
governing authority, mission directorate, and other customer(s) such that no further specific 
mitigating action is required."
>>> sentence2tags_all(sentence)
[('Acceptable Risk', 'mea'), ('mission', 'seterm'), ('risk', 'mea'), ('program', 'opcon'), 
('project', 'seterm'), ('mission directorate', 'seterm'), ('customer', 'grp')]

Construct a Knowledge Graph

Here is a jupyter notebbok example of KG construction using accronyms and definitions.

Verb Phrase Chunking

Makes simple verb based connection between two near-by entities.

cd NASA-SE
python -i tag_sentence.py
>>> sentence = "Acceptable Risk is the risk that is understood and agreed to by the program/project,
governing authority, mission directorate, and other customer(s) such that no further specific 
mitigating action is required."
>>> verb_phrase_relations(sentence)
[('Acceptable Risk [mea]', 'is', 'risk [mea]'), ('risk [mea]', 'is understood', 'program [opcon]'),
('risk [mea]', 'agreed to by', 'program [opcon]')]

Examples of verb phrase extraction using POS tags.

cd NASA-SE
python -i tag_sentence.py
>>> extract_vp([('is', 'VBZ'), ('the', 'DT')])
([('VP', [('is', 'VBZ')])], ['is'])
>>> extract_vp([('that', 'WDT'), ('is', 'VBZ'), ('understood', 'JJ'), ('and', 'CC'), ('agreed', 'VBD'),
('to', 'TO'), ('by', 'IN'), ('the', 'DT')])
([('VP', [('is', 'VBZ'), ('understood', 'JJ')]), ('VP', [('agreed', 'VBD'), ('to', 'TO'), ('by', 'IN')])],
['is understood', 'agreed to by'])

2. SEVA-TOIE

SEVA-TOIE is a targetted open domain information extractor for simple systems engineering sentences which is based on domain specific rules constructed over universal dependencies. It extracts fine-grained triples from sentences and can be used for downstream tasks such as knowledge graph construction and question-asnwering.

Jar files to be downloaded:

Place the following files in NASA-SE/stanford_jars folder

Stanford Parser: stanford-parser-3.9.2-models.jar, stanford-parser.jar
Stanford POSTagger: english-bidirectional-distsim.tagger, stanford-postagger-3.9.2.jar

Sample Run

cd NASA-SE
python -i seva_toie.py
>>> sentence = "STI is an instrument."
>>> toie(sentence)
[('STI', 'is-a', 'instrument)]
>>> sentence = "STI, an instrument, has a 2500 pixel CCD detector."
>>> toie(sentence)
[('STI', 'has', 'CCD detector'), ('STI', 'is-a', 'instrument'), ('CCD detector', 'has-property', '2500 pixel')]

Try with more example/template sentences.

Contact information

For help or issues, please submit a GitHub issue or contact Jitin Krishnan (jkrishn2@gmu.edu).

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
__pycache__		__pycache__
bert		bert
bert_items		bert_items
se_data		se_data
spaCy_data		spaCy_data
.gitattributes		.gitattributes
README.md		README.md
SEVA_KG_Example.ipynb		SEVA_KG_Example.ipynb
SPacy-CR-Example.ipynb		SPacy-CR-Example.ipynb
extract_verb_phrase.py		extract_verb_phrase.py
kg_example.png		kg_example.png
kg_utils.py		kg_utils.py
requirements.txt		requirements.txt
seva_dataset_utils.py		seva_dataset_utils.py
seva_preprocess.py		seva_preprocess.py
seva_toie.py		seva_toie.py
tag_sentence.py		tag_sentence.py
train_evaluate.py		train_evaluate.py

jitinkrishnan/NASA-SE

Folders and files

Latest commit

History

Repository files navigation

NASA-SE

SEVA: A Virtual Assistant for NASA's Systems Engineers

Citation

Note

1. Concept Recognition (CR)

If you just want to extract concepts right away

Training and Evaluating a custom CR model

Download Uncased BERT model to NASA-SE folder.

Datasets

Train and Evaluate

Construct a Knowledge Graph

Verb Phrase Chunking

2. SEVA-TOIE

Jar files to be downloaded:

Sample Run

Contact information

About

Topics

Resources

Stars

Watchers

Forks

Languages