This repository is created and maintained by the AI for Multiple Long Term Conditions Research Support Facility (AIM-RSF). To find out more about this facility, take a look at the getting started repository.
We welcome contributions from anyone, however small or large. If you choose to contribute to this synthetic data repository, please do this in line with our code of conduct. If you want to contribute but you're not sure where to start, see our general guide to contributing.
This repository is about synthetic data and its applications in healthcare and biomedical research. There are three main sections below. The first section covers an introduction to synthetic data. The second and third sections provide tools & resources for generating your own synthetic data or accessing existing synthetic datasets, depending on your use-case. Access to datasets can be a big barrier in healthcare research workflows; using synthetic versions of the health datasets to test ideas and code can speed up research.
-
Read an introduction to synthetic data
- What is synthetic data?
- When can't we use real data?
- Applications of synthetic data in the context of health datasets
- Trade-offs and challenges with synthetic data
-
Generate your own synthetic datasets
- There are many existing tools, methodologies and resources that allow you to generate your own synthetic datasets.
-
Access existing synthetic datasets
- A list of existing synthetic datasets (particularly relevant to the AIM research programme).
See 2-synthetic-data-generation.md for links to software.
- Bates, A. G., Spakulová, I., Dove, I., & Mealor, A. (2019). ONS methodology working paper series number 16—Synthetic data pilot.
- Calcraft, P., Thomas, I., Maglicic, M., & Sutherland, A. (2021). Accelerating public policy research with synthetic data.
- Giuffrè, M., & Shung, D. L. (2023). Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. NPJ Digital Medicine, 6(1), 186.
- Gonzales, A., Guruswamy, G., & Smith, S. R. (2023). Synthetic data in health care: a narrative review. PLOS Digit Health 2(1): e0000082.
- Jordon, J., Szpruch, L., Houssiau, F., Bottarelli, M., Cherubin, G., Maple, C., ... & Weller, A. (2022). Synthetic Data--what, why and how?
- Myles, P., Ordish, J., & Branson, R. (2021). Synthetic data and the innovation, assessment, and regulation of AI medical devices.
This project follows the all-contributors specification, using the emoji key. Contributions of any kind welcome!
Rachael Stickland 🚧 🖋 🤔 📖 |
Eirini Zormpa 🤔 👀 |
Mahwish M 🤔 👀 |
Luis Santos 🤔 |
The information in this repository is licensed under a Creative Commons Attribution 4.0 International License.
For specific information on licenses for illustrations, see this file.