Colovas_Data_Accessibility

New repository for Data Accessibility project begun by Adena Collens. This project, funded by the Office of Research Integrity (ORI), aims to determine the extent of data availability, number of citations, and "link rot" in papers published in the ASM family of journals beginning in year 2000.

Where Do I Go in the Repository?

PaperDataCleanup.R shows how the "groundtruth.csv" file was generated to pull in all relevant metadata from necessary files.
TextScraping.R shows the text scraping process using package rvest to collect the text of all papers.
TidyingText.R shows the process of transforming scraped HTML text into R package "tidytext" format.
Webscraping_thrTidyText.R combines TextScraping.R and TidyingText.R for one file that can be run on a high performance computing cluster (See Slurm directory for more details).
TextStatistics.R shows the process of working with R package "tidytext" to generate summary statistics from tidytext objects.

Other:

See most recently dated *.Rmd file for any relevant project updates.
File seq_papers_20230505.RData has been downloaded from github repository Data_Accessibility begun by Adena Collens in 2022 (https://github.com/SchlossLab/Data_Accessibility/tree/main/data).
This *.RData file contains many data frames with previously generated data that will be re-generated in the future, but are useful for testing and piloting purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
Adena_Stuff		Adena_Stuff
Data		Data
Figures		Figures
Notebook		Notebook
Slurm		Slurm
inst/rmarkdown/templates/pat-meetings		inst/rmarkdown/templates/pat-meetings
.gitignore		.gitignore
AEM_doigathering.R		AEM_doigathering.R
Colovas_Data_Accessibility.Rproj		Colovas_Data_Accessibility.Rproj
DOIgathering.R		DOIgathering.R
LinkRot.R		LinkRot.R
LinkRot_FigureGenerator.R		LinkRot_FigureGenerator.R
MLprep.R		MLprep.R
PapersDataCleanup.R		PapersDataCleanup.R
README.md		README.md
TextScraping.R		TextScraping.R
TextStatistcs.R		TextStatistcs.R
TextStatistics_SumFigs.R		TextStatistics_SumFigs.R
TidyModels.R		TidyModels.R
TidyingText.R		TidyingText.R
Webscraping_thruTidyText.R		Webscraping_thruTidyText.R
environment.yml		environment.yml

SchlossLab/Colovas_Data_Accessibility

Folders and files

Latest commit

History

Repository files navigation

Colovas_Data_Accessibility

Where Do I Go in the Repository?

Other:

About

Resources

Stars

Watchers

Forks

Languages