Skip to content

Predicting suicidal ideation on Reddit using machine learning and DistillBERT

Notifications You must be signed in to change notification settings

eugenebaraka/Predict-Suicidal-Ideation-on-Reddit

Repository files navigation

Using Natural Language Processing to Predict Suicidal Ideation on Reddit

Applying machine learning classification methods to identify suicidal posts in "SuicideWatch" subreddit.

sentiment

Motivation and Business/Social Benefit

Suicide is one of the leading causes of death globally, with an estimated 800,000 deaths annually. That is one death every 40 seconds. A key aspect in suicide prevention is addressing suicidal thoughts and ideas before they turn into actions. By the help of Machine learning, detecting suicidal ideation can assist public health agency in better resource allocation to target people at risk

Project Contents

Data used in this project can be found here. Below is the information required to reproduce the project:

  • Helper functions saved as utils.py
  • Dataset is saved in _data folder as Suicide_Detection.csv
  • Data cleaning notebook saved as data_cleaning.ipynb (the data produced from this notebook is found in the _data folder as clean_reddit.csv)
  • Data Processing notebook found in processing.ipynb
  • Modelling notebook found in modelling.ipynb
  • Final report saved as report.pdf

Replication

After creating your virtual environment, please run the following in command line to replicate

git clone https://github.com/eugenebaraka/Predict-Suicidal-Ideation-on-Reddit.git
cd Predict-Suicidal-Ideation-on-Reddit
pip install -r requirements.txt

Resources and citations

Articles

Repos

About

Predicting suicidal ideation on Reddit using machine learning and DistillBERT

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published