Code explanation:
-
Counter: Program to print the word that appears maximum number of times in a textfile. ('textfile_1' is a part of this program)
-
Senticounter: Program to return the number of positive words in a text file. ('textfile_2' and 'positive-words' is a part of this program)
-
webcounter: Program to return the number of times a word appears in a textfile on a webpage.
-
Web Scraping: Program to parse through Rotten tomatoes movie site, get us the name of the critic, rating, source, date and the length of the text of the review using Beautiful soup.
-
Twitter_Scraping: Program to parse through Cristiano Ronaldo's twitter handle and obtain tweets, retweets, replies and favorites using web driver and Selenium.
-
Fourgram File parser: This program performs the below functionalities.('input.txt', 'positive-words' and 'negative-words' is a part of this program)
6 a) Reports all the sequences of words that follow, 'not' <Positive/Negative word> format.
6 b) Takes any dictionary with alphabets as keys and integers as values and prints the alphabets with the 3 highest integer references.
6 c) Reports the three words from the text that occur the most number of times using frequency as key.
-
NB: Using Naive Bayes algorithm to classify textual data using the sklearn library.
-
Classification: Using RandomForest algorithm to improve accuracy of the textual data classification from 85% to 90%.
-
gridSearch: Using grid seach to set the parameters of a classifier using KNN (K-nearest neighbours), Decision trees and Logistic Regression.
-
run: Using Latent Dirichlet Allocation for topic modeling.