Storage and retrieval of Word Embeddings in various databases

Word Embeddings map lexical items to vectors of real numbers, thus representing lexical items in a mathematical and comparable way. These vectors can vary in size, this is usually called the dimensions of embeddings. Besides creating your own embeddings, you can also use a variety of pretrained models. For instance, Google's Word2Vec model or Facebook's FastText model. Various tutorials can be found out these. This tutorial is focused on storage and retrieval of word embeddings in various databases.

This is, because incorporating these models into a productive applicate can be a problem sometimes, since loading large amounts of embeddings into memory can be a pain. Google's pretrained Word2Vec embeddings (300 dimensions) have 3,4GB in size, Facebook's FastText embeddings (300 dimensions) around 10GB. Since we usually only want a certain subset of embeddings, we need a more efficient way for retrieval. Instead of loading the entire set into memory that is. Thus, we will experiment with storing word embeddings in various databases.

Databases

SQLite
MySQL
MongoDB
LevelDB
PostgreSQL

Results

Setup

# Optional venv
python3 -m venv .venv
source .venv/bin/activate

# Install requirements
pip3 install -r requirements.txt

# Start Jupyter Notebook
jupyter notebook

Contributing

Contributions are welcome. Feel free to open a Pull Request or an Issue.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Results.ipynb		Results.ipynb
Word_Embeddings_LevelDB.ipynb		Word_Embeddings_LevelDB.ipynb
Word_Embeddings_MongoDB.ipynb		Word_Embeddings_MongoDB.ipynb
Word_Embeddings_MySQL.ipynb		Word_Embeddings_MySQL.ipynb
Word_Embeddings_Postgresql.ipynb		Word_Embeddings_Postgresql.ipynb
Word_Embeddings_SQLite3.ipynb		Word_Embeddings_SQLite3.ipynb
dbsizes.png		dbsizes.png
readtimes.png		readtimes.png
requirements.txt		requirements.txt
writetimes.png		writetimes.png

License

martialblog/word_embedding_storage

Folders and files

Latest commit

History

Repository files navigation

Storage and retrieval of Word Embeddings in various databases

Databases

Results

Setup

Contributing

Links and resouces

About

Topics

Resources

License

Stars

Watchers

Forks

Languages