Semantic Coherence

Vakulenko, S.; de Rijke, M.; Cochez, M.; Savenkov, V.; Polleres, A. (2018). Measuring Semantic Coherence of a Conversation. In: International Semantic Web Conference 2018, Monterey, CA, USA https://arxiv.org/pdf/1806.06411.pdf

Requirements

Python 2
unicodecsv
spotlight (pip install pyspotlight)
...

virtualenv myvenv

source myvenv/bin/activate

pip install -r requirements.txt

Run

prepare_dataset.py: create vocabulary and encode development and training data.
adversaries.py: generate adversaries
load_embeddings.py
train_model.py

Summary: paths to the input matrices and embeddings are specified in preprocess.py. 1) Generate embeddings matrix preprocess.py: populate_emb_matrix_from_file; 2) train_model.py

Specify path to the input matrices X y and the embeddings matrix:

preprocess.py:

X_path = 'ubuntu127932_X.npy'
y_path = 'ubuntu127932_y.npy'

embeddings = {
                'DBpedia_GlobalVectors_9_pageRank': {'matrix_path': 'embedding_matrix_DBpedia_GloVe_9PR.npy', 'dims' : 200,
                'all_path': './embeddings/data.dws.informatik.uni-mannheim.de/rdf2vec/models/DBpedia/2016-04/GlobalVectors/9_pageRank/DBpediaVecotrs200_20Shuffle.txt'},
                
                'word2vec': {'matrix_path': 'embedding_matrix_word2vec.npy', 'dims' : 300,
                'all_path': './embeddings/GoogleNews-vectors-negative300.bin'},
                
                'GloVe': {'matrix_path': 'embedding_matrix_GloVe.npy', 'dims' : 300,
                'all_path': './embeddings/glove.840B.300d.txt'}
             }

Load embeddings for the entities in the vocabulary:

preprocess.py: populate_emb_matrix_from_file(embeddings['DBpedia_GlobalVectors_9_pageRank'])

Train CNN model:

Point 'embeddings_name' to the embeddings configuration in the 'embeddings' dictionary, e.g. 'DBpedia_GlobalVectors_9_pageRank'

train_model.py

Dataset

Ubuntu Dialogue Corpus v2.0

Setup

Download Ubuntu Dialogue Corpus v2.0 using the scripts:

git clone https://github.com/rkadlec/ubuntu-ranking-dataset-creator.git cd ubuntu-ranking-dataset-creator pip install -r requirements.txt cd src ./generate.sh -t -s -l

Content

1,852,869 dialogues in TSV format, one dialogue per file. Dialog line format: [0] timestamp [1] sender [2] recepeint [3] utterance [4] named entities for [3] annotated with DBpedia Spotlight.

Annotation

Entity linking

The dialogues are annotated using the dbpedia-spotlight API at http://model.dbpedia-spotlight.org/en/annotate with DBpedia entities e.g. http://dbpedia.org/page/Organisation_of_Islamic_Cooperation

Documentation: http://www.dbpedia-spotlight.org/api

(run with annotate_ubuntu_dialogs() from process_ubuntu_dialogues.py)

2016-10 is the latest version of DBpedia http://downloads.dbpedia.org/2016-10/

Embeddings

Pre-trained RDF2vec and KGlove embeddings:

data.dws.informatik.uni-mannheim.de/rdf2vec/models

trained on the English version of DBpedia 2016-04 http://downloads.dbpedia.org/2016-04/

Acknowledgment

This work is supported by the project 855407 “Open Data for Local Communities” (CommuniData) of the Austrian Federal Ministry of Transport, Innovation and Technology (BMVIT) under the program “ICT of the Future.” Svitlana Vakulenko was supported by the EU H2020 programme under the MSCA-RISE agreement 645751 (RISE BPM).

Name		Name	Last commit message	Last commit date
Latest commit History 455 Commits
data		data
models		models
results		results
src		src
utils		utils
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

models

models

results

results

src

src

utils

utils

README.md

README.md

init.py

init.py

requirements.txt

requirements.txt

Repository files navigation

Semantic Coherence

Requirements

Run

Dataset

Setup

Content

Annotation

Entity linking

Embeddings

Acknowledgment

About

Releases

Packages

Languages

svakulenk0/semantic_coherence

Folders and files

Latest commit

History

Repository files navigation

Semantic Coherence

Requirements

Run

Dataset

Setup

Content

Annotation

Entity linking

Embeddings

Acknowledgment

About

Topics

Resources

Stars

Watchers

Forks

Languages