Skip to content

Classifying whether a disaster tweet is real or not using LSTM and GloVe word embeddings

License

Notifications You must be signed in to change notification settings

naureen20/Real-or-Not-NLP-with-Disaster-Tweets

Repository files navigation

Real or Not? NLP with Disaster Tweets.

Classifying whether a disaster tweet is real or not using RNN with LSTM and GloVe word embeddings. The model gave an accuracy of 80% on both train and validation data set with learning rate 5e-5, predicting whether a given tweet is about a real disaster or not. If so, predicted as 1. If not, predicted as 0. The datasets have been taken from Kaggle Data sets

The kaggle notebook for running file can be viewed here

Each sample in the train and test set has the information about the text of a tweet, A keyword from that tweet (although this may be blank!) and The location the tweet was sent from (may also be blank)

CSV data set has Columns as:

id - a unique identifier for each tweet text - the text of the tweet location - the location the tweet was sent from (may be blank) keyword - a particular keyword from the tweet (may be blank) target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)

EDA performed on data sets are

1. Data processing

1.1 Handling Misspelled data

1.2 Handling Contractions

1.3 Replacing Abbreviations

1.4 Visualizing length of tweets

1.5 Visualizing word count in each tweet

1.6 Collecting all words

2. Visualizing and data attributes

2.1 Viewing most common stop words used in tweets

2.2 Viewing Punctuations in tweets

2.3 Viewing Common words in tweets

2.4 N-gram analysis

3. Data cleaning

3.1 Cleaning URLs and HTML tags

3.2 Cleaning Punctuations and emojis

3.3 Cleaning stop words

3.4 Using Glo-Ve for word embeddings

3.5 Train-Test split

4. Creating Model

4.1 LSTM Model with Glove Embeddings

4.2 Plotting accuracy and loss curves

Releases

No releases published

Packages

No packages published