This is the submission for Pravega'21, hackathon conducted by IISc bengalore.
After the covid 19 outbreak we have seen a huge amount of information dissemination. With the current usage of social media platforms, consumers are creating and sharing more information than ever before, some of which are misleading with no relevance to reality. Automated classification of a text article as misinformation or disinformation is the target of this project. In this presentation, we’ll describe our approach to reach the solution.
We see that most of the related works focus on improving the prediction quality by adding additional features. The fact is that these features are not always available, for instance some article may not contain images. There is also the fact that using social media information is problematic because it is easy to create a new account on these media and fool the detection system. That’s why we chose to focus on the article body.
The project has a express-node server, having a multipage application in the frontend created using ejs. The pages are styled through vanilla CSS and bootstrap. We have connected our node.js backend with the machine learning model with child process module using the command line utilities.
We used spaCy to segment the sentences into words, punctuation, etc. This is done according to rules specified by each language. The vocabulary is built according to the occurrence of the words in the corpus.
Glove Embeddings is used to convert the corpus into embeddings. The Glove embedding is trained on aggregated global word-word co-occurrence statistics from a corpus. The resulting representations showcase interesting linear substructures of the word vector space.
You need to have the following things installed
Once you are done with that follow the next instructions as stated
> git clone https://github.com/sudip-mondal-2002/fake-news-detector.git
> cd fake-news-detector
> yarn install
OR
> npm install
> pip install -r requirements.txt
> yarn run start
OR
> npm start
Now go to your browser and browse to (http://localhost:3000)
OR
Check out the GitHub Gists
-
Fakenews Dataset from Kaggle
-
GloVe Embeddings, Stanford University (2014)
-
Recurrent Neural Networks, Lipton et al. (2015)
-
Long Short-Term Memory, Greff et al. (2015)