Instructions

What is Text Summarization?

Text summarization is the process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks).

A module for E-mail Summarization which uses clustering of skip-thought sentence embeddings.

Instructions

The code is written in Python 3.
The module uses code of the Skip-Thoughts paper which can be found in the repo.

How the whole thing process

Very first it's all about email parsing i mean removing
- Signatures (e-mail signature is a block of text that is appended to the end of an e-mail message you send)
- Salutation (e-mail salutation is a greeting, you write at the top of your email)

Given above two steps are really complicated and it requires you a good understanding for Regular Expression(Regex).

After these two initial steps we get the email body. Body is in different languages for using

Google tranlate api convert them to one language"English".
Perform Sent_tokenization
Using Skip-thought-vector record the sentence embedding of each sentence.
With kmeans clustring clustre the embedding
After all that perform the Topic modelling on email data.

Skip-Thought Vectors

This is an migrated implementation of Skip-Thoughts for Python 3 from tensorflow/models.

Skip thoughts model is a powerful sentence encoder, which encodes sentences with Seq2Seq model. The meanings of sentences are kept in the encoded vectors through this method. However, the code mentioned above does not support Python3.

Due to the lack of compatibility with Python3, I made some modifications to the original implementations. And I write up a simple way to encode sentences with combined skip thought vectors.

The original design is described in:
Jamie Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler. Skip-Thought Vectors. In NIPS, 2015.

Code edit by

Code editor: Vinay vikram (@vikramvinay)

Code credit to :

Original code author: Chris Shallue (@cshallue)

How to use?

Get the pre-trained models

To download the pre-trained models on the BookCorpus dataset.

bash get_skip_thoughts_pretrained_models.sh

Encode sentences

To encode sentences into combined skip-thought vectors (unidirectional + bidirectional):

Please go to encode-by-skip-thoughts.ipynb and put you data loader in.

Or follow the instructions in tensorflow/models:

# Encode sentences with unidirectional skip thought vectors

import numpy as np
from skip_thoughts import configuration
from skip_thoughts import encoder_manager

# TODO: Load your dataset here.
data = []

VOCAB_FILE = "skip_thoughts/pretrained/skip_thoughts_uni_2017_02_02/vocab.txt"
EMBEDDING_MATRIX_FILE = "skip_thoughts/pretrained/skip_thoughts_uni_2017_02_02/embeddings.npy"
CHECKPOINT_PATH = "skip_thoughts/pretrained/skip_thoughts_uni_2017_02_02/model.ckpt-501424"

encoder = encoder_manager.EncoderManager()
encoder.load_model(configuration.model_config(bidirectional_encoder=False),
                   vocabulary_file=VOCAB_FILE,
                   embedding_matrix_file=EMBEDDING_MATRIX_FILE,
                   checkpoint_path=CHECKPOINT_PATH)

encodings = encoder.encode(data)

Notebook Author

Vikram singh

Twitter: https://twitter.com/vikramvinay1
Linkdin: https://www.linkedin.com/in/vikram-singh-ai-venture/
Facebook: https://www.facebook.com/AIVentureO/https://www.quora.com/profile/Vinay-Vikram-8
Quora: https://www.quora.com/profile/Vinay-Vikram-8

Visit AiVenture

                                    -----Feel Free to contact us if you have any question-----

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Images		Images
skip_thoughts		skip_thoughts
Email_summarization_complete code notebook.ipynb		Email_summarization_complete code notebook.ipynb
LICENSE		LICENSE
README.md		README.md
WORKSPACE		WORKSPACE
encode-by-skip-thoughts.ipynb		encode-by-skip-thoughts.ipynb
encoding_vector		encoding_vector
final_data.csv		final_data.csv
get_skip_thoughts_pretrained_models.sh		get_skip_thoughts_pretrained_models.sh
outfile		outfile
requirements.txt		requirements.txt
sampledata.xlsx		sampledata.xlsx
sentences		sentences

License

AIVenture0/Email_summarization-with-Unsupervised-Learning

Folders and files

Latest commit

History

Repository files navigation

What is Text Summarization?

Instructions

How the whole thing process

Skip-Thought Vectors

Code edit by

Code credit to :

How to use?

Get the pre-trained models

Encode sentences

Notebook Author

About

Topics

Resources

License

Stars

Watchers

Forks

Languages