Skip to content

asharifara/tweet-disaster-detection-NLP

Repository files navigation

Tweet disaster detection using NLP

Following are the steps to create the text classification model:

  1. Importing the required libraries
  2. Importing the dataset
  3. Text preprocessing (Text may contain numbers, special characters, and unwanted spaces. Hence, we should remove these special characters and numbers from text)
  4. Converting text to numbers
  5. Spliting data into training and test sets
  6. Training text classification model and predicting Sentiment
  7. Evaluating the model
  8. Saving the model
  9. Load the model

Just as a reference for regex (Since I have used it in the text preprocessing)

  • . : Wildcard, matches a single character
  • ^ : Indicates start of a string
  • $ : Indicates end of a string
  • [ ]: Matches one of the set of characters within [ ]
    • [a-z]: Matches one of the characters of a,b,c,...,z
    • [^abc]: Matches a character that is not a, b, or c
  • a|b: Matches either a or b, where a and b are string
  • \ : Escapes characters for special characters (\t,\n,\b)
  • \b : Matches word boundary
  • \d : Matches any digit, equivalent to [0-9]
  • \D : Matches any non-digit, equivalent to [^0-9]
  • \s : Matches any whitespace character, equivalent to [ \t\n\r\f\v]
  • \S : Matches any non-whitespace character, equivalent to [^ \t\n\r\f\v]
  • \w : Matches any alphanumeric character, equivalent to [a-zA-Z0-9_]
  • \w : Matches any non-alphanumeric character, equivalent to [^a-zA-Z0-9_]
  • * : Matches zero or more occurrences
  • + : Matches one or more occurrences
  • ? : Matches zero or one occurrences
  • {n} : Matches exactly n occurrences
  • {n,} : Matches at least n occurrences
  • {,n} : Matches at most n occurrences
  • {m,n} : Matches at least m occurrences and at most n occurrences

About

This repo is created for disaster detection using tweets..

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published