Skip to content

szmer/BERTPolishWSD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This package is for testing Polish word sense disambiguation with BERT. Currently we're focusing on performing tests on the small plWordnet3-annotated corpus made for CoDeS. We compare the BERT embedding of the token to disambiguate with embeddings of tokens of the same lemma that we know are of certain sense (because they appear in the reference corpus or Wordnet glosses).

This is a work in progress. It's also intended to deprecate the gibber code down the line (better code quality, models etc.).

Installation

Requirements

  • Python 3.7 or newer
  • pip
  • virtualenv
  • Docker

Resources needed

Installation process

docker pull djstrong/krnnt:1.0.1
virtualenv .
source bin/activate
pip3 install -r requirements.txt # this may be just pip on some platforms
deactivate

Running

In one terminal window:

docker run -p 9003:9003 -it djstrong/krnnt
# To kill, ctrl+c

In another terminal window:

source bin/activate
# After you review local_settings.py, run this to see the options:
python3 run.py --help
# (this may be just python instead of python3 on your machine)
# Plain `python3 run.py` will just train and test an embedding dictionary from Wordnet and the train corpus.
# After you're done:
deactivate

To test:

source bin/activate
python3 test.py
# After you're done:
deactivate

About

experiments with word sense disambiguation using BERT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages