dataset

Name		Name	Last commit message	Last commit date
parent directory ..
config		config
das_classifier		das_classifier
dev_set		dev_set
exports		exports
README.md		README.md
calculate_stats.py		calculate_stats.py
evaluation_plos.py		evaluation_plos.py
get_authors_top.py		get_authors_top.py
get_das_unique.py		get_das_unique.py
get_export.py		get_export.py
get_export_merged.py		get_export_merged.py
parser_function.py		parser_function.py
parser_main.py		parser_main.py
sample_dev_set.py		sample_dev_set.py

README.md

Dataset

Folder containing the necessary code to create a dataset for analysis from the PubMed Central Open Access collection.

config folder: contains config files, ground truth, the list of BMC and PLoS journals as well as the Science-Metrix journal classification.
das classifier folder: contains code and instructions to reproduce the DAS classification step.
dev set folder: contains a uniform sample of 1000 articles from the PMC OA collection, created using the sample_dev_set.py script, which can be used for agile development.
exports folder: contains exports from scripts.
logs folder: empty, for log files.
A set of scripts to create the dataset, see below for instructions. You might need to adjust some parameters at the beginning of each script before using them.

Instructions

Download the Pubmed OA collection, e.g. via their FTP service: https://www.ncbi.nlm.nih.gov/pmc/tools/ftp. Optionally sample it using the sample_dev_set.py script (or use the development dataset of 1000 articles which is provided in the dev set folder).
Setup a MongoDB and update the config file.
Run the parser_main.py script, which will create a first collection of articles in Mongo.
Run the calculate_stats.py script, which will calculate citation counts for articles and authors and create the relative collections in Mongo.
Run the get_export.py script, which will create a first export of the dataset in the exports folder.
Run the get_das_unique.py script, which will pull out unique DAS for classification.
Follow the instructions in the DAS classifier README.
Run the get_export_merged.py script, to create the final dataset for analysis.
Optionally, run the evaluation_plos.py and get_authors_top.py for evaluation.

Requirements

See requirements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

config

config

das_classifier

das_classifier

dev_set

dev_set

exports

exports

README.md

README.md

calculate_stats.py

calculate_stats.py

evaluation_plos.py

evaluation_plos.py

get_authors_top.py

get_authors_top.py

get_das_unique.py

get_das_unique.py

get_export.py

get_export.py

get_export_merged.py

get_export_merged.py

parser_function.py

parser_function.py

parser_main.py

parser_main.py

sample_dev_set.py

sample_dev_set.py

README.md

Dataset

Contents

Instructions

Requirements

Files

dataset

Directory actions

More options

Directory actions

More options

Latest commit

History

dataset

Folders and files

parent directory

Dataset

Contents

Instructions

Requirements