Determining Fake News

Spongebob: Fake news! Credits: my good friend Casper Guo.

Media has always been a tool of the ones with political power to manipulate the masses. Given the power dynamic between the media press and the readers, it is unfortunately quite easy to circulate fake news in the media, in a high-quality form that makes them indistinguishable from real news to the reader. The presence of fake news diverts the readership from real information that need to be circulated into the society at large, such as fact-based political news for making informed choices, real facts about safe public health practices, etc. As fake news becomes more “real”, it is essential that there be a reliable program that can tell the reader whether a news is real or fake for them to perceive the media accurately, especially today when accurately informing people on safe public health practices and vaccine facts are quintissential.

Installation Instructions

Prereqs

Clone the repository.
pip install -r requirements.txt
python spacy download en_core_web_lg

Training the model

Run python lstm.py or python lstm_t.py (for titles only) using a GPU. (Note: This step is optional as pre-trained models are included)

Running the front-end

Run python app.py or flask run.
Go to http://localhost:5000/

Technical Details

Training Data and Preprocessing

The training data includes over 50k news articles from three different datasets

The details for the preprocessing are in preprocessing.py. The procedure involves first tokenizing the article (or title), padding the length to 1000 words (for body) and 54 words (for title), and using spaCy to generate 300-dimensional word vectors. This process is done in parallel on the CPU to increase performance.

The data from all three datasets are then shuffled and split 70-30 into training and validation sets.

Model Architecture

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
masking (Masking)            (None, None, 300)         0         
_________________________________________________________________
lstm (LSTM)                  (None, None, 256)         570368    
_________________________________________________________________
dropout (Dropout)            (None, None, 256)         0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 256)               525312    
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense (Dense)                (None, 2)                 514       
=================================================================
Total params: 1,096,194
Trainable params: 1,096,194
Non-trainable params: 0
_________________________________________________________________

Training and Other Info

How this model is trained is specified in lstm.py and lstm_t.py. Some details of the training process for the pre-trained models are logged by Tensorboard and shown below.

The distribution of the model parameters are shown below.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
img		img
models		models
templates		templates
test_data		test_data
.gitignore		.gitignore
.slugignore		.slugignore
Procfile		Procfile
README.md		README.md
app.py		app.py
human.py		human.py
human_results.py		human_results.py
input_test.py		input_test.py
lstm.py		lstm.py
lstm_t.py		lstm_t.py
model_final.h5		model_final.h5
model_title.h5		model_title.h5
pre.ipynb		pre.ipynb
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
testing.py		testing.py

jonah-chen/fake-news

Folders and files

Latest commit

History

Repository files navigation

Determining Fake News

Installation Instructions

Prereqs

Training the model

Running the front-end

Technical Details

Training Data and Preprocessing

Model Architecture

Training and Other Info

About

Resources

Stars

Watchers

Forks

Languages