GitHub - rohetoric/text-vector-visualisation: Website: https://rohetoric.github.io/text-vector-visualisation/

Exploration & Visualisation of FastText Word Vectors Using TensorFlow 1 and 2

Requirements and Dependencies

To run the code the following are a must to be installed:

Serial No	Libraries to Install
1.	FastText
2.	TensorFlow
3.	Spacy

Steps to Execute

Download the bbc-text.csv dataset from here or it can be downloaded through the terminal if gcloud is already setup by the command gsutil cp gs://dataset-uploader/bbc/bbc-text.csv [path to notebook directory]
Make sure all the libraries are present/updated according to the requirements and dependencies mentioned above.
To train the model according to the above complete dataset using FastText, run the notebook fasttextmodeltrain.ipynb present in _notebooks folder. A pre-trained model (2.4GB size) based on the dataset can be downloaded from here.

According to the FastText documentation:

The most important parameters of the model are its dimension and the range of size for the subwords. The dimension (dim) controls the size of the vectors, the larger they are the more information they can capture but requires more data to be learned. As any value in the 100-300 range is popular, the notebook has been implemented with dimension equal to 300.

Steps 4,5 and 6 differ for TF1 and TF 2. After that, the steps are same.

To Visualise Embeddings Using TF1 [NOT ADVISED]

Create a folder called tb1files in the same directory of the notebooks and keep it empty. It will store all the tensorflow log files after step 5 is run.
Run the notebook tb1vis.ipynb present in _notebooks folder.
Set the terminal address path to the directory where the files are stored in the terminal and type the command: tensorboard --logdir tb1files/

The above command would yield a result:

To Visualise Embeddings Using TF2 [ADVISED]

Create a folder called tb2files in the same directory of the notebooks and keep it empty. It will store all the tensorflow log files after step 5 is run.
Run the notebook tb2vis.ipynb present in _notebooks folder.
Set the terminal address path to the directory where the files are stored in the terminal and type the command: tensorboard --logdir tb2files/

The above command would yield a result:

Open the local host URL link present in the last line. For Example: http://localhost:6008/ [in TB1 Command image].
The local host website shown below will run. From the drop-down which reads Inactive, press and go to Projector as depicted by the arrow in the image below.

This will plot the words according to their embedding values shown in the 3D graph of tensorboard. The nearest neighbours of a word can be found by typing the word in the search bar, as done for the example ‘plea’ shown below.

That's it, folks!

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github		.github
_action_files		_action_files
_fastpages_docs		_fastpages_docs
_includes		_includes
_layouts		_layouts
_notebooks		_notebooks
_pages		_pages
_plugins		_plugins
_posts		_posts
_sass/minima		_sass/minima
_word		_word
assets		assets
images		images
.devcontainer.json		.devcontainer.json
.gitattributes		.gitattributes
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
_config.yml		_config.yml
docker-compose.yml		docker-compose.yml
index.html		index.html

License

rohetoric/text-vector-visualisation

Folders and files

Latest commit

History

Repository files navigation

Exploration & Visualisation of FastText Word Vectors Using TensorFlow 1 and 2

Requirements and Dependencies

Steps to Execute

To Visualise Embeddings Using TF1 [NOT ADVISED]

To Visualise Embeddings Using TF2 [ADVISED]

About

Topics

Resources

License

Stars

Watchers

Forks

Languages