Purpose of this example code (see: https://github.com/roland-zauner/predict-age-ethno) is to illustrate how ethnicity can be predicted based genotype data (R script: "predict-ethno.R"), specifically on ancestry informative single nucleotide polymorphisms (AISNPs) identified by Kidd and Seldin (see Pakstis et al 2019: https://www.nature.com/articles/s41598-019-55175-x) as well as using ethnicity-specific allel frequencies derived from the 1000 genome project (see: https://www.internationalgenome.org/). Further, script code (R script: "predict-age.R") is provided as an example on how to infer biological and phenological age (interpreted as lifespan) using algorithms developed by Horvath and colleagues (see Levine et al 2018: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5940111/) from DNA methylation data generated by Illumina's EPIC array assays.
If you have git (see: https://github.com/git-guides/install-git)and docker installed on your local machine (see: https://docs.docker.com/get-docker/) you can perform the following steps:
- Copy all files from git repo to your local machine
sudo git pull https://github.com/roland-zauner/predict-age-ethno
- Build docker image containing R and respective libraries defined in the Dockerfile
sudo docker build -t predict:latest .
The initial build takes approximately 10 hrs on a small scale google virtual cloud e2-micro machine (0.25 vCPU and 1 GB of memory, 10GB disk space).
Based on Kidd and Seldin AISNP panel and 1000 genome ethnicity allel frequencies.
The following folder structure is assumed:
-
Provide genotype/SNP data in folder "input", an anonymized, randomized example file is provided
-
Run R script predict-ethno.R in docker container with arguments "path to SNP data (.txt) and 1000 genome allele frequency data file (aisnp.1kg.RData)"
sudo docker run --rm -v /home/pro-env/input:/home/input -v /home/pro-env/output:/home/output pred-ethno:latest Rscript /home/scripts/predict-ethno.R /home/input/SNP-test-dataset.txt /home/scripts/aisnp.1kg.RData
- Inspect prediction result in folder "output" (plain txt file)
Based on Horvath "SkinBlood" and Horvath/Levine "PhenoAge" clocks:
-
Provide Illumina EPIC array raw data files (end with _Red.idat and _Grn.idat) in folder "input"
-
Run R script predict-age.R in docker container with argument "path to input folder"
sudo docker run --rm -v /home/pro-env/input:/home/input -v /home/pro-env/output:/home/output predict:latest Rscript /home/scripts/predict-age.R /home/input/
- Inspect prediction result in folder "output" (plain txt file)
Software refers to the here (github/roland-zauner/predict-age-ethno) provided R scripts including any underlying software packages such as R, docker, R packages and algorithms (in specific from Horvath & collegues or respective package authors) provided in R packages as well as data derived from resources such as the 1000 genome project.
Provider refers to the owner of this github account (github/roland-zauner/predict-age-ethno).
User refers to anyone using software provided on this github account.
The software is provided 'as is,' without warranty of any kind. In no event shall the provider (github account roland-zauner) be liable for any damages whatsoever, including but not limited to, direct, indirect, special, incidental or consequential damages, loss of business profits or special damages, even if the provider has been advised of the possibility of such damages. The manufacturer shall have no liability for any claims, damages or fees arising from the use of the software in a manner that violates any intellectual property rights or any applicable licenses. The entire risk as to the results and performance of the software is assumed by the user. Without limiting the foregoing, the provider shall have no liability for any damage or injury to person or property arising out of the use or inability to use the software, or any unauthorized access to or alteration of your transmissions or data.
This is free software, and you are welcome to redistribute it
under conditions of the GNU GENERAL PUBLIC LICENSE (see: )
which also applies for underlying software packages used by this script, eg statistical software R.
The GNU General Public License does not permit incorporating your program
into proprietary programs.
- git & github (see: https://github.com/git-guides/install-git)
- docker (see: https://docs.docker.com/get-docker/)
- posit R docker base image (see: https://github.com/rstudio/r-docker and https://www.r-project.org/)
- R/cran package "magrittr" (see: https://CRAN.R-project.org/package=magrittr)
- R/cran package "dplyr" (see: https://CRAN.R-project.org/package=dplyr)
- R/cran package/library "readr" (see: https://CRAN.R-project.org/package=readr)
- R/Bioconductor package/library "minfi" (see: https://doi.org/10.1093/bioinformatics/btu049, https://doi.org/10.1093/nar/gkt090, https://doi.org/10.1093/bioinformatics/btw691)
- R/Bioconductor package/library "wateRmelon" (see: https://doi.org/10.1186/1471-2164-14-293)
- ethnicity allele frequency of SNPs derived from 1000 genome project phase 3 data (eg.: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp)
- AISNP panel from Kidd and Seldin (see: https://www.nature.com/articles/s41598-019-55175-x)
- DNAm age prediction algorithm developed by Horvath and Levine (see: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5940111/ and https://clockfoundation.org/about-the-clock-foundation/) implemented in wateRmelon R package by (see: https://doi.org/10.1186/1471-2164-14-293)