Hoax Detection Model for Identifying Deceptive Content

Created by Fitria Dwi Wulandari – September, 2020

Project Background

In today's digital age, the spread of misinformation and fake news poses a significant challenge, impacting public opinion, decision-making processes, and even social cohesion. Recognizing the importance of combating this issue, the project aims to develop a hoax detection model capable of identifying and flagging potentially deceptive content across various online platforms.

Objectives

The primary objective of this project is to develop a hoax detection model capable of identifying and flagging potentially deceptive content, thereby helping users discern between reliable and unreliable sources and preventing the spread of harmful information.

Methodology

`Data Preparation`

Source: The dataset was obtained from the Satria Data for Big Data Challenge, which contains text data such as article titles and content.
Actions: Involves removing unnecessary noise, transforming text into a consistent format, and handling any irregularities that may hinder meaningful analysis, such as removing stop words, removing punctuation and special characters, tokenization and stemming.

`Machine Learning`

Approach: Building predictive models to classify the sentiment of user feedback.
Algorithms Tested: Four different algorithms were evaluated to determine the best model for hoax detection.

`Tools`

Programming Language: Python
Libraries: Pandas, NumPy, Scikit-learn, NLTK, Matplotlib, Seaborn

Results

The analysis revealed that the Logistic Regression model, achieving an accuracy of 83%, emerged as the most effective in predicting the truthfulness of articles. This model can serve as a valuable tool for users, enabling them to discern between reliable and unreliable sources and preventing the spread of harmful information that could lead to confusion, panic, or societal harm.

Future Work

Implementation: The best model will be used to predict unseen data (new articles), aiding in the continuous detection of deceptive content.
Model Improvement: Explore more advanced natural language processing techniques and models to improve accuracy.
Extended Analysis: Include additional data sources and variables for a more comprehensive detection system.

Repository Contents

Script: Python scripts for data preprocessing, cleaning, and model training.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Hoax Detection.ipynb		Hoax Detection.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hoax Detection.ipynb

Hoax Detection.ipynb

README.md

README.md

Repository files navigation

Hoax Detection Model for Identifying Deceptive Content

Project Background

Objectives

Methodology

`Data Preparation`

`Machine Learning`

`Tools`

Results

Future Work

Repository Contents

About

Releases

Packages

Languages

fitria-dwi/Hoax-Detection

Folders and files

Latest commit

History

Hoax Detection.ipynb

Hoax Detection.ipynb

README.md

README.md

Repository files navigation

Hoax Detection Model for Identifying Deceptive Content

Project Background

Objectives

Methodology

Data Preparation

Machine Learning

Tools

Results

Future Work

Repository Contents

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`Data Preparation`

`Machine Learning`

`Tools`

Packages