Following are the steps to create the text classification model:
- Importing the required libraries
- Importing the dataset
- Text preprocessing (Text may contain numbers, special characters, and unwanted spaces. Hence, we should remove these special characters and numbers from text)
- Converting text to numbers
- Spliting data into training and test sets
- Training text classification model and predicting Sentiment
- Evaluating the model
- Saving the model
- Load the model
- . : Wildcard, matches a single character
- ^ : Indicates start of a string
- $ : Indicates end of a string
- [ ]: Matches one of the set of characters within [ ]
- [a-z]: Matches one of the characters of a,b,c,...,z
- [^abc]: Matches a character that is not a, b, or c
- a|b: Matches either a or b, where a and b are string
- \ : Escapes characters for special characters (\t,\n,\b)
- \b : Matches word boundary
- \d : Matches any digit, equivalent to [0-9]
- \D : Matches any non-digit, equivalent to [^0-9]
- \s : Matches any whitespace character, equivalent to [ \t\n\r\f\v]
- \S : Matches any non-whitespace character, equivalent to [^ \t\n\r\f\v]
- \w : Matches any alphanumeric character, equivalent to [a-zA-Z0-9_]
- \w : Matches any non-alphanumeric character, equivalent to [^a-zA-Z0-9_]
- * : Matches zero or more occurrences
- + : Matches one or more occurrences
- ? : Matches zero or one occurrences
- {n} : Matches exactly n occurrences
- {n,} : Matches at least n occurrences
- {,n} : Matches at most n occurrences
- {m,n} : Matches at least m occurrences and at most n occurrences