Skip to content

This project aims to build a model to predict the truth of an article, hoax or non-hoax. Apart from that, this project also wants to identify the percentage of hoax and non-hoax articles.

Notifications You must be signed in to change notification settings

fitria-dwi/Hoax-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

Hoax Detection Model for Identifying Deceptive Content

Created by Fitria Dwi Wulandari – September, 2020

Project Background

In today's digital age, the spread of misinformation and fake news poses a significant challenge, impacting public opinion, decision-making processes, and even social cohesion. Recognizing the importance of combating this issue, the project aims to develop a hoax detection model capable of identifying and flagging potentially deceptive content across various online platforms.

Objectives

The primary objective of this project is to develop a hoax detection model capable of identifying and flagging potentially deceptive content, thereby helping users discern between reliable and unreliable sources and preventing the spread of harmful information.

Methodology

Data Preparation

  • Source: The dataset was obtained from the Satria Data for Big Data Challenge, which contains text data such as article titles and content.
  • Actions: Involves removing unnecessary noise, transforming text into a consistent format, and handling any irregularities that may hinder meaningful analysis, such as removing stop words, removing punctuation and special characters, tokenization and stemming.

Machine Learning

  • Approach: Building predictive models to classify the sentiment of user feedback.
  • Algorithms Tested: Four different algorithms were evaluated to determine the best model for hoax detection.

Tools

  • Programming Language: Python
  • Libraries: Pandas, NumPy, Scikit-learn, NLTK, Matplotlib, Seaborn

Results

The analysis revealed that the Logistic Regression model, achieving an accuracy of 83%, emerged as the most effective in predicting the truthfulness of articles. This model can serve as a valuable tool for users, enabling them to discern between reliable and unreliable sources and preventing the spread of harmful information that could lead to confusion, panic, or societal harm.

Future Work

  • Implementation: The best model will be used to predict unseen data (new articles), aiding in the continuous detection of deceptive content.
  • Model Improvement: Explore more advanced natural language processing techniques and models to improve accuracy.
  • Extended Analysis: Include additional data sources and variables for a more comprehensive detection system.

Repository Contents

  • Script: Python scripts for data preprocessing, cleaning, and model training.

About

This project aims to build a model to predict the truth of an article, hoax or non-hoax. Apart from that, this project also wants to identify the percentage of hoax and non-hoax articles.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published