CoNLL-X is an annotation schema for describing linguistic features across diverse languages.
CoNLL-U is a further development of this annotation schema
for the Universal Dependencies formalism.
The annotation files contain three types of lines: comment lines, word lines and blank lines.
Comment lines precede word lines and start with a hash character #
.
These lines can be used to provide metadata about the word lines that follow.
Each word line contains annotations for a single word or token. Larger linguistic units are represented by subsequent word lines.
The annotations for a word line:
Field | Description |
---|---|
ID | Index of the word in sequence |
FORM | The form of a word or punctuation symbol |
LEMMA | Lemma or the base form of a word |
UPOS | Universal part-of-speech tag |
XPOS | Language-specific part-of-speech tag |
FEATS | Morphological features |
HEAD | Syntactic head of the current word |
DEPREL | Universal dependency relation to the HEAD |
DEPS | Enhanced dependency relations |
MISC | Any additional annotations |
Finally, a blank line after word lines is used to separate sentences.
First, clone the repository and cd to it
git clone https://github.com/asiryk/natural-language-processing.git
cd natural-language-processing
# And clone git submodules (ud_tools and ud_ukrainian)
git submodule update --init --recursive
Then you have to create Virtual Environment and activate it
Note: make sure you have Python version > 3.3
python3 -m venv venv
# And activate it
# Mac/Linux
source venv/bin/activate
# Windows
venv/Scripts/activate.bat
Then install dependencies to just created venv
pip install -r requirements.txt
And now you are able to launch the main.py
file
python -m src.main
To run the test suites
python -m unittest src.test
The source folder for LaTeX files is ./article
, but the root of the compiler
should be in ./
. To compile the thesis, use XeLaTeX compiler, and make
sure you have all the used fonts installed, i.e. Fira Code, etc.
Originally it were comiled using Overleaf.
- CoNLL-U Viewer link