Skip to content

dsquintana/synthpop-primer

Repository files navigation

Synthetic datasets: A non-technical primer for the biobehavioural sciences to promote reproducibility and hypothesis-generation

DOI

Synthetic datasets are an emerging method originally developed to permit the sharing of confidential census data. Synthetic datasets mimic real datasets by preserving their statistical properties and the relationships between variables. Importantly, this method also reduces disclosure risk to essentially nil as no record in the synthetic dataset represents a real individual. This is the accompanying R script for my primer manuscript, which enables scholars to create synthetic datasets and assess their utility via the synthpop R package. By sharing synthetic datasets that mimic original datasets that could not otherwise be made open, researchers can ensure the reproducibility of their results and facilitate data exploration while maintaining participant privacy.

Run the analysis in your web browser

To launch a RStudio server instance and run my analysis scripts online, click here or on the "Launch Binder" badge below.

Launch Rstudio Binder

Once the Rstudio server instance has loaded, run the commands in the "R_script.R" file.

Due to resource constraints of the RStudio server instance, the scripts that create Supplementary Figures 1-3 described in the primer manuscript could not be included. These scripts can be found on the manuscript's Open Science Framework page.

Run this analysis locally

To run the analysis locally in RStudio, download this repository as a zipped file. The R version and package versions are noted in the sessionInfo.txt file.

About

A primer on using the 'synthpop' package for the biobehavioral sciences

Resources

License

Stars

Watchers

Forks

Packages

No packages published