Skip to content

The folliwing ML project involves EDA analysis of Election Dataset, Data preparation for modelling, and prediction using ML models. Also Text Analysis on the inaugral corpora from nltk to analyse the most frequently used words in Presidents' Speeches.

Notifications You must be signed in to change notification settings

Honey28Git/Machine-Learning

Repository files navigation

Machine-Learning

Problem 1:

You are hired by one of the leading news channels CNBE who wants to analyze recent elections. This survey was conducted on 1525 voters with 9 variables. You have to build a model, to predict which party a voter will vote for on the basis of the given information, to create an exit poll that will help in predicting overall win and seats covered by a particular party.

Dataset for Problem: Election_Data.xlsx

Data Ingestion: 1.1 Read the dataset. Do the descriptive statistics and do the null value condition check. Write an inference on it. 1.2 Perform Univariate and Bivariate Analysis. Do exploratory data analysis. Check for Outliers.

Data Preparation: 1.3 Encode the data (having string values) for Modelling. Is Scaling necessary here or not? Data Split: Split the data into train and test (70:30).

Modeling: 1.4 Apply Logistic Regression and LDA (linear discriminant analysis). 1.5 Apply KNN Model and Naïve Bayes Model. Interpret the results. 1.6 Model Tuning, Bagging (Random Forest should be applied for Bagging), and Boosting. 1.7 Performance Metrics: Check the performance of Predictions on Train and Test sets using Accuracy, Confusion Matrix, Plot ROC curve and get ROC_AUC score for each model. Final Model: Compare the models and write inference which model is best/optimized.

Inference: 1.8 Based on these predictions, what are the insights?

Problem 2:

In this particular project, we are going to work on the inaugural corpora from the nltk in Python. We will be looking at the following speeches of the Presidents of the United States of America:

President Franklin D. Roosevelt in 1941 President John F. Kennedy in 1961 President Richard Nixon in 1973 (Hint: use .words(), .raw(), .sent() for extracting counts)

2.1 Find the number of characters, words, and sentences for the mentioned documents. –

2.2 Remove all the stopwords from all three speeches. –

2.3 Which word occurs the most number of times in his inaugural address for each president? Mention the top three words. (after removing the stopwords) –

2.4 Plot the word cloud of each of the speeches of the variable. (after removing the stopwords)

Code Snippet to extract the three speeches:

" import nltk nltk.download('inaugural') from nltk.corpus import inaugural inaugural.fileids() inaugural.raw('1941-Roosevelt.txt') inaugural.raw('1961-Kennedy.txt') inaugural.raw('1973-Nixon.txt')

About

The folliwing ML project involves EDA analysis of Election Dataset, Data preparation for modelling, and prediction using ML models. Also Text Analysis on the inaugral corpora from nltk to analyse the most frequently used words in Presidents' Speeches.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published