Source-Recommendation-System

Source Recommendation System takes an article from the user as input and outputs any relevant article from 8.5 million articles in the dataset to the user. It uses Apache Spark to handle this huge load of articles.

Prerequisites

This project uses rake-nltk library to extract keywords.

pip install rake-nltk

FakeNewsCorpus was used as dataset (27 GB) for news articles. Apache Spark has been used to handle this huge dataset. It needs to be correctly installed and configured. The configuration file for Spark can be found at spark-2.4.4-bin-hadoop2.7 folder. Hadoop was used as underlying distributed file system. The configuration for Hadoop can be found at hadoop-conf folder. Both of them needs to changed according to your configuration.

Source Code

The source code can be found at /src folder.

The whole dataset was partitioned into smaller files. The code to partition dataset can be found at PartitionFakeNewsCorpus.py file.
The code to extract keywords from partitioned dataset can be found at ExtractKeywordsFromFakeCorpus.py file.
The main code to input article and to output relevant articles can be found at FindSimillarDocs.py file.

Algorithm & Implementation Details

This idea was implement as project for course work of Distributed System course in Colorado State Univeristy. Detailed description of the algorithm can be found here -

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
ExampleRun		ExampleRun
docs		docs
hadoop-conf		hadoop-conf
spark-2.4.4-bin-hadoop2.7		spark-2.4.4-bin-hadoop2.7
src		src
.gitignore		.gitignore
README.md		README.md
bashrc_append.txt		bashrc_append.txt
urls.txt		urls.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExampleRun

ExampleRun

docs

docs

hadoop-conf

hadoop-conf

spark-2.4.4-bin-hadoop2.7

spark-2.4.4-bin-hadoop2.7

src

src

.gitignore

.gitignore

README.md

README.md

bashrc_append.txt

bashrc_append.txt

urls.txt

urls.txt

Repository files navigation

Source-Recommendation-System

Prerequisites

Source Code

Algorithm & Implementation Details

Authors

About

Releases

Packages

Languages

OvroAbir/Source-Recommendation-System

Folders and files

Latest commit

History

Repository files navigation

Source-Recommendation-System

Prerequisites

Source Code

Algorithm & Implementation Details

Authors

About

Topics

Resources

Stars

Watchers

Forks

Languages