GitHub - CPalmer3200/Destiny_Scraping_Tools: Web scraping tools designed to assemble automated daily/monthly literature reviews

Introduction

This project is designed and tailored to assist Destiny Pharma PLC in scraping recently posted scientific literature and assembling this into monthly literature reviews.

Repo architecture

This repo is comprised of 6 scripts which provide 2 distinct functions.

X_main.py are web-scraping bots designed to extract relevant PubMed literature, rank the literature based on search category and importance, and send a push email if any high priority literature is identified. These scripts are automated to run daily, using GitHub actions.
review_X.py are scripts run monthly which assembles all of the literature gathered in the previous month (for a target bot) and creates a literature reviews. These are then emailed to the target recipients. This script is automated to run monthly, using GitHub actions.

Usage

Using the web scraping bots (dermal_main.py, m3_main.py, ...)

The bots rely on their respective data folders which contain doi_db.txt, log.txt, rank.txt and queries.txt files:

doi_db.txt: Database containing all the DOIs the bot has identified - this prevent duplicates
log.txt: Simple log file recording when the bot was run and a breakdown of the papers per rank found
rank.txt: Files containing strings ready for inclusion in the monthly literature review
queries.txt: PubMed queries syntax run in order line by line

*The bot also uses the image.JPG in the main repo and attaches it to the email

Using the monthly review script (review_nasal.py, ...)

The review script does not have it's own respective folder but requires _template.docx(s), start_date.txt, review_log.txt files

template.docx: Formatted .docx file which is the template for the respective literature review that is generated
start_date.txt: Accessed by the script to record the start date of the literature scraping
review_log.txt: Simple log file that documents the searches run and their respective date

*The review script will automatically update the start_date and clear the rank.txt files once run

Automation

This repo is designed to be automated using GitHub actions - please see the workflows folder and ensure the actions bot has permission to commit to the repo

Altering the daily scraping bot ('X_main.py')

Although this bot network has been tailored for specific use, it can be adapted by any user using the following steps:

Please create blank log, database, and rank files and then create your own queries.txt, image.JPG, and template.docx files

Within the html_formatting() function change the search_queries variable to be a readable string of the rank 1 queries. Also change the 'url' variable

Alter the bot_email, project name and directory under main() function

Change the email_password, and email_receiver variables in main()

Altering the literature review script (review_x.py)

Alter the bot_email, project name and directory under main() function
Change the email_password, and email_receiver variables in main()

Name		Name	Last commit message	Last commit date
Latest commit History 315 Commits
.github/workflows		.github/workflows
.idea		.idea
dermal_data		dermal_data
m3_data		m3_data
nasal_data		nasal_data
README.md		README.md
dermal_main.py		dermal_main.py
image.JPG		image.JPG
m3_main.py		m3_main.py
nasal_main.py		nasal_main.py
requirements.txt		requirements.txt
review_dermal.py		review_dermal.py
review_log.txt		review_log.txt
review_m3.py		review_m3.py
review_nasal.py		review_nasal.py

CPalmer3200/Destiny_Scraping_Tools

Folders and files

Latest commit

History

Repository files navigation

Introduction

Repo architecture

Usage

Using the web scraping bots (dermal_main.py, m3_main.py, ...)

Using the monthly review script (review_nasal.py, ...)

Automation

Altering the daily scraping bot ('X_main.py')

Altering the literature review script (review_x.py)

About

Topics

Resources

Stars

Watchers

Forks

Languages