Skip to content

This project develops a deep learning model that trains on 1.6 million tweets for sentiment analysis to classify any new tweet as either being positive or negative.

License

Notifications You must be signed in to change notification settings

shubhambhatia2103/Twitter-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter-Sentiment-Analysis

Welcome to the Twitter Sentiment Analysis (NLP) project repository! 🌟

Overview

This project delves into the realm of Natural Language Processing (NLP) to understand and interpret human sentiments expressed on social media platforms, particularly Twitter. By analyzing tweets, we aim to gauge the emotional tone behind the text, thereby exploring the power of NLP in sentiment analysis.

Dataset

The project utilizes the Sentiment140 dataset from Kaggle, containing 1.6 million tweets labeled with sentiment.

Key Steps

1. Data Acquisition: Downloading the Twitter sentiment140 dataset from Kaggle.

2. Data Cleaning :

  • Handling missing values and duplicates.
  • Removing irrelevant information (e.g., URLs, dates, usernames).
  • Converting text to lowercase for consistency.

3. Text Preprocessing :

  • Tokenizing text into words.
  • Removing stopwords (common words with little meaning)
  • Applying stemming to reduce words to their root forms. (example : actor, actress, acting = act)

4. Feature Extraction : Converting text into numerical vectors using TF-IDF.

5. Model Training : Building a Logistic Regression classifier to learn sentiment patterns to classify tweets as positive or negative.

6. Model Evaluation : Measuring model accuracy using appropriate metrics.

7. Model Deployment : Saving the model for future use and potential deployment in a web application or API.

How to Use

This section guides you through setting up and running the project:

1. Prerequisites :

  • Python : Ensure you have Python 3.9 installed on your system. You can check by running python --version in your terminal

  • Libraries : Install the required Python libraries listed in requirements.txt. Open a terminal, navigate to your project directory, and run:

  pip install -r requirements.txt
  • Setup Kaggle API : If using the Sentiment140 dataset, follow the instructions to configure the Kaggle API and download the dataset.
  Before we start downloading the data set, we need the Kaggle API token. To get that

  1. Login into your Kaggle account
  2. Get into your account settings page
  3. Click on Create a new API token
  4. This will prompt you to download the .json file into your system. Save the file, and  we will use it in the next step.

2. Running the Project :

  • Clone this repository:
  git clone https://github.com/shubhambhatia2103/Twitter-Sentiment-Analysis.git
  • Navigate to the Project Directory :
  cd Twitter-Sentiment-Analysis
  • Launch Jupyter Notebook : Open notebook.ipynb This file contains the core code for data analysis and model building. The notebook provides step-by-step guidance and explanations for each code section.
  Jupyter Notebook

3. Additional Notes :

  • You may need to adjust file paths or code blocks depending on your specific setup.
  • The notebook assumes the dataset is located in the data folder within the project directory.
  • Feel free to experiment and modify the code for further exploration and learning.

Tech Stack

Python

Libraries: pandas, NumPy, NLTK, scikit-learn, pickle

Kaggle API

Jupyter Notebook

Contributions

Contributions are welcome! Feel free to fork this repository and create a pull request.

Future Work

  • Explore different NLP techniques and model architectures.
  • Experiment with hyperparameter tuning for model optimization.
  • Investigate sentiment analysis for different topics or domains.
  • Build a web application or API for real-time sentiment analysis.

Authors

Feedback

If you have any feedback, please reach out to me at Shubhambhatia2103@gmail.com

Acknowledgements

Contact

License

This repository is licensed under the MIT License - see the LICENSE file for details. Feel free to use, modify, and distribute the code as per the terms of the license.

Releases

No releases published

Packages

No packages published