Tweet-Corrector-using-Encoder-Decoder-Model

INPUT

->shape of input vector (20, 100)
->each word is represented as a vector of 100 features
->sentences with more than 20 words are clipped
->and one with less than 20 words are padded with zero vectors
->target : one hot vectors of dimension = length of vocabulary

OUTPUT

->softmax output

Approach Taken :-

First we load all the tweet data from the file 'consolidate.csv' and separately store original and corrected tweets in two lists after tokenizing them. We used nltk library to tokenize sentences into a list of words.
The original data is then preprocessed. Each word is converted to its lowercase words, to maintain uniformity in the dataset. The corrected tweets are processed to find all the unique words and their count. Only a subset of these unique words are chosen according to their occurrence count for our bag of words.
We Created one-hot vectors for each word in our bag of words, which will constitute our 'expected output' data. Each word in the original tweet dataset is converted to its corresponding vector. Gensim's word2vec was used for this purpose.
Now that we had all the required data in their proper format, we segregated and randomly chose X(input) and y(expected output) vectors from the dataset. This data was split into training(4050) and validation data(50).
For our network :- Model used : encoder-decoder RNN model using keras error function : Categorical cross entropy activation : softmax

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Encoder_Decoder.ipynb		Encoder_Decoder.ipynb
README.md		README.md
consolidated.csv		consolidated.csv
encoder_decoder.py		encoder_decoder.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoder_Decoder.ipynb

Encoder_Decoder.ipynb

README.md

README.md

consolidated.csv

consolidated.csv

encoder_decoder.py

encoder_decoder.py

Repository files navigation

Tweet-Corrector-using-Encoder-Decoder-Model

About

Releases

Packages

Languages

vishwassathish/Tweet-Corrector-using-Encoder-Decoder-Model

Folders and files

Latest commit

History

Repository files navigation

Tweet-Corrector-using-Encoder-Decoder-Model

About

Topics

Resources

Stars

Watchers

Forks

Languages