This project aims to analyze the sentiment of tweets related to the 2019 Indonesia Election. Sentiment analysis plays a crucial role in understanding public opinion and attitudes towards political events, providing valuable insights for decision-making and public discourse.
The primary goal of this project is to develop a sentiment analysis model that accurately predicts the sentiment (positive, negative, or neutral) of tweets regarding the 2019 Indonesia Election. By analyzing the sentiment expressed in tweets, we aim to gain insights into public perception and sentiment towards political events during the election period.
In this project, we performed extensive text preprocessing to prepare the tweet data for sentiment analysis. The preprocessing steps included:
- String Parsing: Ensuring consistency by converting all tweets to string format.
- Split Hashtag: Splitting camel case hashtags to improve readability.
- Lowercasing: Converting all text to lowercase to standardize the text.
- Remove URL: Eliminating URLs from the text data to remove irrelevant information.
- Remove HTML Tags: Stripping HTML tags from the text data.
- Remove Numeric: Replacing numeric values with their respective textual representation.
- Remove String Emoticon: Removing string emoticons from the text.
- Remove Punctuation: Removing punctuation marks from the text while retaining single quotes.
- Extract Emoji: Replacing emojis with their textual descriptions.
- Remove Special Character: Eliminating special characters from the text data.
- Remove 3 Repeating Characters: Reducing repeating characters to a maximum of two consecutive occurrences.
- Remove Single Word: Removing standalone single characters from the text.
- Stemming: Applying stemming using the Sastrawi library to reduce words to their root forms.
- Remove Stopwords: Initially intended to remove stopwords, but not executed due to the risk of losing contextual information.
The EDA revealed the distribution of sentiments (positive, negative, neutral) across different candidates (Jokowi, Prabowo, Maruf, Sandi) based on the tweet data.
We created datasets for modeling by vectorizing the preprocessed tweet text using various techniques:
- Frequency-Based Vectorization: Count Vectorization and TF-IDF Vectorization
- Co-Occurrence Matrix
- N-Gram Vectorization
- Prediction-Based Vectorization: CBOW and Skip-gram Word2Vec models
We trained two models for sentiment analysis:
- LSTM (Long Short-Term Memory) Neural Network
- Random Forest Classifier
In this project, using TF-IDF vectorization and the Random Forest model, we achieved an accuracy of 66% and an F1 score of 61% in predicting sentiment from tweets related to the 2019 Indonesia Election. The results demonstrate the effectiveness of the model in capturing public sentiment and providing valuable insights into the opinions and attitudes of the Indonesian population towards political events.