GitHub - SOM-Research/MLData_readiness: Data supporting the study: On the Readiness of Scientific Data for Fair and transparent ML

This is the data supporting the study: "On the Readiness of Scientific Data for a Fair and Transparent Use in Machine learning"

In this repository you will find:

1 - Full Results: The full results of the extraction process containing 4041 data papers annotated using the scripts in the root of this project

The fullResults.xlsx file contains the whole results of the extraction process, and the ResultsSData.xlsx and ResultsDBrief.xlsx contanins the results for each journal.

2 - Analysis sheet: The sheet with the charts, counts and analysis done to write the study

The FullStudyAnalysis.xlsx contains the full data, the charts, the topic analysis, and high-level insights of the data

3 - Code: The code used to extract the data. One for each journal. This will help into replicating the experiment.

dataPaperScrapping.ipynb notebook contains the code used to filter all the data papers type of both journals, and get the PDF (when possible). If you want to reproduce the experiment you may start by this notebook.

Once you have all the PDF of the journals, SDataExtractor.py and DBriefExtractor.py contains the code to perform the extraction for each journal. Note you will need and OpenAI ApiKey and a GROBID service running to execute the notebooks.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
DBrief		DBrief
DataPreProcess		DataPreProcess
SData		SData
DBriefExtractor.py		DBriefExtractor.py
FullStudyAnalysis.xlsx		FullStudyAnalysis.xlsx
MIT-LICENSE		MIT-LICENSE
README.md		README.md
ResultsDBrief.csv		ResultsDBrief.csv
ResultsSData.csv		ResultsSData.csv
SDataExtractor.py		SDataExtractor.py
dataPapersScrapping.ipynb		dataPapersScrapping.ipynb
fullResults.xlsx		fullResults.xlsx
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DBrief

DBrief

DataPreProcess

DataPreProcess

SData

SData

DBriefExtractor.py

DBriefExtractor.py

FullStudyAnalysis.xlsx

FullStudyAnalysis.xlsx

MIT-LICENSE

MIT-LICENSE

README.md

README.md

ResultsDBrief.csv

ResultsDBrief.csv

ResultsSData.csv

ResultsSData.csv

SDataExtractor.py

SDataExtractor.py

dataPapersScrapping.ipynb

dataPapersScrapping.ipynb

fullResults.xlsx

fullResults.xlsx

requirements.txt

requirements.txt

Repository files navigation

This is the data supporting the study: "On the Readiness of Scientific Data for a Fair and Transparent Use in Machine learning"

About

Releases

Languages

License

SOM-Research/MLData_readiness

Folders and files

Latest commit

History

Repository files navigation

This is the data supporting the study: "On the Readiness of Scientific Data for a Fair and Transparent Use in Machine learning"

About

Topics

Resources

License

Stars

Watchers

Forks

Languages