Sentiment Analysis on 2019 Indonesia Election Tweets

Background

This project aims to analyze the sentiment of tweets related to the 2019 Indonesia Election. Sentiment analysis plays a crucial role in understanding public opinion and attitudes towards political events, providing valuable insights for decision-making and public discourse.

Goal

The primary goal of this project is to develop a sentiment analysis model that accurately predicts the sentiment (positive, negative, or neutral) of tweets regarding the 2019 Indonesia Election. By analyzing the sentiment expressed in tweets, we aim to gain insights into public perception and sentiment towards political events during the election period.

Text Preprocessing

In this project, we performed extensive text preprocessing to prepare the tweet data for sentiment analysis. The preprocessing steps included:

String Parsing: Ensuring consistency by converting all tweets to string format.
Split Hashtag: Splitting camel case hashtags to improve readability.
Lowercasing: Converting all text to lowercase to standardize the text.
Remove URL: Eliminating URLs from the text data to remove irrelevant information.
Remove HTML Tags: Stripping HTML tags from the text data.
Remove Numeric: Replacing numeric values with their respective textual representation.
Remove String Emoticon: Removing string emoticons from the text.
Remove Punctuation: Removing punctuation marks from the text while retaining single quotes.
Extract Emoji: Replacing emojis with their textual descriptions.
Remove Special Character: Eliminating special characters from the text data.
Remove 3 Repeating Characters: Reducing repeating characters to a maximum of two consecutive occurrences.
Remove Single Word: Removing standalone single characters from the text.
Stemming: Applying stemming using the Sastrawi library to reduce words to their root forms.
Remove Stopwords: Initially intended to remove stopwords, but not executed due to the risk of losing contextual information.

Exploratory Data Analysis (EDA)

The EDA revealed the distribution of sentiments (positive, negative, neutral) across different candidates (Jokowi, Prabowo, Maruf, Sandi) based on the tweet data.

Dataset Creation

We created datasets for modeling by vectorizing the preprocessed tweet text using various techniques:

Frequency-Based Vectorization: Count Vectorization and TF-IDF Vectorization
Co-Occurrence Matrix
N-Gram Vectorization
Prediction-Based Vectorization: CBOW and Skip-gram Word2Vec models

Modeling

We trained two models for sentiment analysis:

LSTM (Long Short-Term Memory) Neural Network
Random Forest Classifier

Conclusion

In this project, using TF-IDF vectorization and the Random Forest model, we achieved an accuracy of 66% and an F1 score of 61% in predicting sentiment from tweets related to the 2019 Indonesia Election. The results demonstrate the effectiveness of the model in capturing public sentiment and providing valuable insights into the opinions and attitudes of the Indonesian population towards political events.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
code		code
Documentation.pdf		Documentation.pdf
README.md		README.md
tweet.csv		tweet.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

Documentation.pdf

Documentation.pdf

README.md

README.md

tweet.csv

tweet.csv

Repository files navigation

Sentiment Analysis on 2019 Indonesia Election Tweets

Background

Goal

Text Preprocessing

Exploratory Data Analysis (EDA)

Dataset Creation

Modeling

Conclusion

About

Releases

Packages

Languages

Theofilusarifin/Sentiment-Analysis-on-2019-Indonesia-Election

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis on 2019 Indonesia Election Tweets

Background

Goal

Text Preprocessing

Exploratory Data Analysis (EDA)

Dataset Creation

Modeling

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Languages