This project contains the code for creating vector representations of texts.
Before starting the project make sure these requirements are available:
- conda. For setting up your research environment and python dependencies.
- git. For versioning your code.
First create the virtual environment where all the modules will be stored.
Using the virtualenv
command, run the following commands:
# install the virtual env command
pip install virtualenv
# create a new virtual environment
virtualenv -p python ./.venv
# activate the environment (UNIX)
./.venv/bin/activate
# activate the environment (WINDOWS)
./.venv/Scripts/activate
# deactivate the environment (UNIX & WINDOWS)
deactivate
Install conda, a program for creating python virtual environments. Then run the following commands:
# create a new virtual environment
conda create --name text-reps python=3.8 pip
# activate the environment
conda activate text-reps
# deactivate the environment
deactivate
To install the requirements run:
pip install -e .
TODO
-
Add support for various language models
- Sentence Transformers
- BERT
- RoBERTa
- XLM-RoBERTa
-
Add support for various word embedding models
- word2vec
- GloVe
- fastText
-
Develop main script
-
Write documentation
-
Provide examples
This work is developed by Department of Artificial Intelligence at Jozef Stefan Institute.
This work is supported by the Slovenian Research Agency and the TODO.