HeadlineClickbaitDetector

Is that headline Clickbait? is a Transformer-based News Clickbait Detector. It is a Fall 2022 Course Project of COMP599 Natural Language Understanding with Deep Learning at McGill University.

To catch readers' attention digital and print media are using “Clickbait” headlines. To their monetary benefits, they are misleading the population by publishing catchy headlines to get more user engagements and clicks per post. In this project, we will try to find an answer to a fundamental question if only a news article’s headline sufficient to successfully classify if it is clickbait or would we also need more context from the related body of the article?

Instructions for use

The code is present inside the code folder.

The process folder contains the scripts for data preparation to convert the given jsonl files to a dataframe which in turn is converted to CSV file for further use.

The classical_ml_approach_headline notebook has SVM and XGBoost based approach on the headline of the data.

The clickbait_classification_headline_article notebook has Clickbait Detection on Headline's text & Article's text using Deberta and Electra pre-trained Models and their Tokenizers.

The clickbait_classification_headline notebook has Clickbait Detection on Headline's text using Deberta and Electra pre-trained Models and their Tokenizers.

The clickbait_classification_headline_thresholding_pr_curve notebook is used for thresholding for electra and deberta and evaluating the PR curve how well the models are classifying.

The analyse_generated_headline notebook in title_generation folder uses SentenceBert to get the embeddings of Ground Truth and Generated headlines. Then it uses similarity metrics and thresholding to evaluate the classification task of clickbait vs non-clickbait.

The prepare_title_gen_data notebook in title_generation folder is used for preprocessing data to pass it to T5 model.

The T5 Fine tune and Rouge notebook in title_generation folder is used for Fine-Tuning T5-base model to current data and uses Rouge score to select the best model.

The title_generation_T5_fine_tuned and title_generation_T5_of_the_shelf notebooks in title_generation folder are used to generate headline based on the article text using the the two different models on train and test set of clickbait and no-clickbait samples.

The visualize_embeddings notebooks in title_generation folder is used for data visualization like generating TSNE plots.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
code		code
Clickbait_detector_Demo.ipynb		Clickbait_detector_Demo.ipynb
LICENSE		LICENSE
README.md		README.md
demo.png		demo.png
final_report.pdf		final_report.pdf
progress presentation.pdf		progress presentation.pdf
project_intro.pdf		project_intro.pdf
project_proposal.pdf		project_proposal.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

Clickbait_detector_Demo.ipynb

Clickbait_detector_Demo.ipynb

LICENSE

LICENSE

README.md

README.md

demo.png

demo.png

final_report.pdf

final_report.pdf

progress presentation.pdf

progress presentation.pdf

project_intro.pdf

project_intro.pdf

project_proposal.pdf

project_proposal.pdf

Repository files navigation

HeadlineClickbaitDetector

Instructions for use

About

Releases

Packages

Contributors 2

Languages

License

charmichokshi/ClickbaitDetector

Folders and files

Latest commit

History

Repository files navigation

HeadlineClickbaitDetector

Instructions for use

About

Resources

License

Stars

Watchers

Forks

Languages