Semantic reasoning of skills in the domain of human resources

This is a repository containing code for the Master Thesis conducted in he University of Koblenz-Landau together with talentsconnect AG.

The research aim is to compare 3 methods of embedding skills into the vector space - distributional (text-based), relational (ontology-based) and hybrid approach using Attract-Repel model. The paper is available upon request. With this repository you can train the Atrac-Repel and evaluate it on the 2 datasets - intrinsic, that tests the skill embeddings compared to the manual annotaions, and extrinsic ha evaluated the performance of the similar jobs task that takes embeddings as an input. The extrinsic data is provided by talentsconnect AG and can be shared upon request.

Attract-Repel training

First install the required libraries either with pip:

pip install -r requirements.txt

or with conda:

conda install -r requirements.txt

The Attract-Repel is trained on the word2vec vectors and uses linguistic consraints derived from the ESCO ontology, which are contained in the atttract-repel/word-vectors/init_google_we.txt and attract-repel/linguistic_constraints/similar_skills.txt respectively. The file attract-repel/config/experiment_parameters.cfg contains the hyperparameters used in the grid search to find the best combination of attract_margin, batch_size and l2. To only run Attract-Repel on a specified set of hyperparameters, write the same value in the first and second place.

Run the following command to start the training:

python attract-repel/code/attract_repel.py -c config_path -s save_model -e evaluation

Arguments:

config_path : a path to the config file, the default value is attract-repel/config/experiment_parameters.cfg
save_model : boolean variable, whether to store the model file in the results/grid_search folder, the default value is False
evaluation : boolean variable, whether to run the evaluation of the models within the training, the default is True. To run the evaluation you need to provide 3 paths in the config file: gold_standard (path to the extrinsic set), companyDataset (path to the list of jobs used in the extrinsic evaluation) and skills_annotated_sample (path to the intrinsic evaluation set).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
attract-repel		attract-repel
config		config
docs		docs
resources		resources
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
attract-repel.ipynb		attract-repel.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attract-repel

attract-repel

config

config

docs

docs

resources

resources

utils

utils

.DS_Store

.DS_Store

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

attract-repel.ipynb

attract-repel.ipynb

requirements.txt

requirements.txt

Repository files navigation

Semantic reasoning of skills in the domain of human resources

Attract-Repel training

About

Releases

Packages

Languages

License

stannida/skill-embeddings

Folders and files

Latest commit

History

Repository files navigation

Semantic reasoning of skills in the domain of human resources

Attract-Repel training

About

Topics

Resources

License

Stars

Watchers

Forks

Languages