Text semantic similarity

Calculate semantic similarity of each pair of sentences/paragraphs in two texts (English or Chinese).
Models include word2vec, tfidf, lda, lsi.

Dependencies

Python 3.6.5
pandas, numpy, jieba, nltk, gensim, re, sklearn, codecs, time

Implementation

Since semantic similarity labeling is difficult and time consuming, unsupervised semantic similarity caculating is useful.
I choose four models to calculate semantic vectors of texts, then utilize cosine distance to calculate similarity.

How to use?

Prepare your data: pairs of sentence/paragraphs in two txt files
Command: excute.py text1_path text2_path res_path -l en/cn
You can run excute.py with -h to get information about arguments details.

How to change model setting?

I implement four models to calculate unsupervised semantic similarity including word2vec, tfidf, lda, lsi.
By changing setting in config.py:
You can change the output vector dimension of each model except tfidf (dimension depends on corpus size).
You can combine more than one models to generate semantic vectors of texts.

Contact

As a beginner interested in NLP/Data Mining, I would be delighted for your encouragement!
Feel free to mail me at 953383269@qq.com with any comments/problems/questions/suggestions.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
calculate_similarity.py		calculate_similarity.py
chinese_stopwords.txt		chinese_stopwords.txt
config.py		config.py
data_preprocess.py		data_preprocess.py
english_stopwords.txt		english_stopwords.txt
excute.py		excute.py
semantic_vectors_generator.py		semantic_vectors_generator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

calculate_similarity.py

calculate_similarity.py

chinese_stopwords.txt

chinese_stopwords.txt

config.py

config.py

data_preprocess.py

data_preprocess.py

english_stopwords.txt

english_stopwords.txt

excute.py

excute.py

semantic_vectors_generator.py

semantic_vectors_generator.py

Repository files navigation

Text semantic similarity

Dependencies

Implementation

How to use?

How to change model setting?

Contact

About

Releases

Packages

Languages

Lipairui/Text-semantic-similarity

Folders and files

Latest commit

History

Repository files navigation

Text semantic similarity

Dependencies

Implementation

How to use?

How to change model setting?

Contact

About

Resources

Stars

Watchers

Forks

Languages