Skip to content

spatiban/Data-analytics-using-Python

Repository files navigation

Data analytics using Python

Code explanation:

  1. Counter: Program to print the word that appears maximum number of times in a textfile. ('textfile_1' is a part of this program)

  2. Senticounter: Program to return the number of positive words in a text file. ('textfile_2' and 'positive-words' is a part of this program)

  3. webcounter: Program to return the number of times a word appears in a textfile on a webpage.

  4. Web Scraping: Program to parse through Rotten tomatoes movie site, get us the name of the critic, rating, source, date and the length of the text of the review using Beautiful soup.

  5. Twitter_Scraping: Program to parse through Cristiano Ronaldo's twitter handle and obtain tweets, retweets, replies and favorites using web driver and Selenium.

  6. Fourgram File parser: This program performs the below functionalities.('input.txt', 'positive-words' and 'negative-words' is a part of this program)

6 a) Reports all the sequences of words that follow, 'not' <Positive/Negative word> format.

6 b) Takes any dictionary with alphabets as keys and integers as values and prints the alphabets with the 3 highest integer references.

6 c) Reports the three words from the text that occur the most number of times using frequency as key.

  1. NB: Using Naive Bayes algorithm to classify textual data using the sklearn library.

  2. Classification: Using RandomForest algorithm to improve accuracy of the textual data classification from 85% to 90%.

  3. gridSearch: Using grid seach to set the parameters of a classifier using KNN (K-nearest neighbours), Decision trees and Logistic Regression.

  4. run: Using Latent Dirichlet Allocation for topic modeling.

About

Python programs for web mining, text mining and machine learning for data science.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages