`KArgen` - Knowledge Acquisition Generalization with Multi-task Learning

KArgen is the generalization implementation for my Master's Thesis:

Automatic Knowledge Acquisition for the Special Cargo Services Domain with Unsupervised Entity and Relation Extraction

Code structure adopted from: anago

The generalization part provides a model that can be used for entity/relation extraction from special cargo text. The training set was created automatically via KArgo. The model architecture can be seen here:

This repository contains the following folders:

data/kargo: all datasets for NER/EE/RE in CONLL format. Multi-task modeling as proposed by Bekoulis et al. (2018).
- train: training sets as produced by KArgo
  - not_terms_only: dataset contains all sentences, including sentences without entities (for EE)
  - terms_only: dataset contains only sentences with at least one entity (for EE)
- dev_rel, test_rel: development and test set 1
- online_rel: test set 2 (online documents, based on HTML/PDF excerpts)
kargen: source code folder for KArgen
- crf.py: CRF layer implementation for Keras, based on keras-contrib
- models.py: model structure and wrapper for simplified Hiearchical Multi-task Learning from hmtl
- preprocessing.py: preprocessing pipeline for sequential deep learning model
- trainer.py: training routine for KArgen model, including callbacks.
main.py: example of KArgen training and evaluation routine, including saving/loading models.
infer.ipynb: example of extraction with the trained models, visualization with displaCy
results.ipynb: notebook for visualizing model training/evaluation results, can be seen here

A comparison of Precision/Recall/F-score for model trained with automatic training set (Auto) and development set (Manual), for test set 1 (holdout news articles):

and for test set 2 (online documents):

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data/kargo		data/kargo
images		images
kargen		kargen
results/spe18_20e_history		results/spe18_20e_history
test		test
.gitignore		.gitignore
README.md		README.md
infer.ipynb		infer.ipynb
main.py		main.py
requirements.backup.txt		requirements.backup.txt
requirements.txt		requirements.txt
results.ipynb		results.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/kargo

data/kargo

images

images

kargen

kargen

results/spe18_20e_history

results/spe18_20e_history

test

test

.gitignore

.gitignore

README.md

README.md

infer.ipynb

infer.ipynb

main.py

main.py

requirements.backup.txt

requirements.backup.txt

requirements.txt

requirements.txt

results.ipynb

results.ipynb

utils.py

utils.py

Repository files navigation

`KArgen` - Knowledge Acquisition Generalization with Multi-task Learning

About

Languages

yoseflaw/KArgen

Folders and files

Latest commit

History

Repository files navigation

KArgen - Knowledge Acquisition Generalization with Multi-task Learning

About

Topics

Resources

Stars

Watchers

Forks

Languages

`KArgen` - Knowledge Acquisition Generalization with Multi-task Learning