Text Semantic Similarity

This is the repository for the code that ran the experiments presented in the following article: Introduction to Deep Similarity Learning for Sequences

File Exploration

The most important files are:

EDA.ipynb Exploratory Data Analysis notebook: used to clean and analyse the dataset. Generates the pickled version of the dataset with pre-computed sentence embeddings
Training.ipynb Main training pipeline: loads pickled dataset generated using the EDA.ipynb notebook
contrastiveModel.py : Models are kept in a single file for the moment as they share loads of similarities.

Installation

I advise the use of Anaconda distribution to run the code of this project. An anaconda environment file has been generated and can be used to create a new working environment using the following command:

conda env create -f environment.yml

Dataset generation

To generate the dataset, retrieved the source in kaggle and then import it and run the commands shown in notebook/EDA.ipynb to save a pickled dataset file (approx. 3GB of size).

Execution

To execute the main code of this project, you can either run:

cd notebook
jupyter notebook

and then run the Training.ipynb model.

Or you could just run:

python main.py

Results

The training result of my initial TextSimilarityDeepSiameseLSTM class with a LogReg classifier are the following:

Train Acc: 0.7993654994990785 - Val Acc: 0.7652195423623995 - Test Acc: 0.7669758812615955

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.vscode		.vscode
article		article
notebook		notebook
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
auc_curve.png		auc_curve.png
environment.yml		environment.yml
pip_requirements.txt		pip_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

article

article

notebook

notebook

src

src

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

auc_curve.png

auc_curve.png

environment.yml

environment.yml

pip_requirements.txt

pip_requirements.txt

Repository files navigation

Text Semantic Similarity

File Exploration

Installation

Dataset generation

Execution

Results

About

Releases

Packages

Contributors 2

Languages

License

dimartinot/Text-Semantic-Similarity

Folders and files

Latest commit

History

Repository files navigation

Text Semantic Similarity

File Exploration

Installation

Dataset generation

Execution

Results

About

Resources

License

Stars

Watchers

Forks

Languages