Skip to content

The project develops an application that suggests to the reader more similar articles to that he already read. It uses the embedding algorithms of headlines to create their own numerical representation, which allows to compute the similarity between headlines and get the most similar ones.

Notifications You must be signed in to change notification settings

akhsassoualid/Headline_Recommender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Recommender Engine for headlines articles using embedded words.

The project develop an application that suggest to readers more similar articles to those they already read. It uses the embedding algorithms of headlines to create their own numerical representation, which allows to compute similarity between headlines and get the most similar ones.

For purpose of simplicity, we was satisfied only with headlines that concernes the year of 2018.

Steps of the project

We build the function "general_process" saved in the preprocessing.py file, to prepare the text data. Its output is the processed_data csv file, that contains the headlines after the preprocessing.

three algorithms are used to build a numerical representation of each headline, We talk about:

  • NMF and LDA factorization: We create a sparse matrix that composed of rows that represent each headlines and columns that represent each word in the entire vocabulary.
  • word2vec : A deeplearning approach, that uses an average word2vec of words composing that headline. those algorithms are exploited with the function "recommender_engine" developed in the recommender py file.

To excecute the app

Clone the repository in the commend line using the link : https://github.com/akhsassoualid/Headline_Recommender.

git clone https://github.com/akhsassoualid/Headline_Recommender.git

Install the necessary requirements :

pip install -r requirements.txt

Run the application savec in the app.py file

streamlit run app.py

Illustrate the application

A simple illustration of the App : Alt text

Deployment on Docker

Build the app image, execute in the command line :

docker build -t app .

To the container :

docker run -p 8501:8501 app

Special Thanks:

  • Google team of researchers for the Word2Vec trained model.
  • To the team of Streamlit for their open-source Python library to build applications.
  • To vikashrajluhaniwal for his tutorial about recommendation system.
  • To my friends Rachid and Salih for their help.

About

The project develops an application that suggests to the reader more similar articles to that he already read. It uses the embedding algorithms of headlines to create their own numerical representation, which allows to compute the similarity between headlines and get the most similar ones.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published