Supplementary Material (scripts) to reproduce all the analysis of the manuscript:
"Estimation of Molecular Pairwise Relatedness in Autopolyploid Crops"
by Rodrigo R Amadeu, Leticia de Castro Lara, Patricio R Munoz, Antonio Augusto Franco Garcia
Journal: G3: Genes, Genomes, Genetics, Early online October 13, 2020
https://doi.org/10.1534/g3.120.401669
All the scripts were evaluated under Linux Ubuntu 20.04 64 bits. Before run it, verify if PedigreeSim
and PolyRelatedness
software are fully working. For PedigreeSim
you need to have Java Runtime Environment installed. PedigreeSim
details at https://www.wur.nl/en/show/Software-PedigreeSim.htm. PolyRelatedness
details at https://github.com/huangkang1987/polyrelatedness. In addition, you need to have R
and AGHmatrix
package installed. Details in https://cran.r-project.org/ and https://cran.r-project.org/package=AGHmatrix respectivally.
This is a comprehensive set of scripts designed to be able to run in a personal computer. In practice for a higher number of replicates you would need to split the for loops
into parallel computation. The details on this task depends upon your computational cluster specifications.
Any question, please open an Issue
here in this page or send me an e-mail (rramadeu at ufl dot edu).
It uses mainly R and bash scripts. 0_runAll.sh
is a wrap-up bash script to run all the steps. For a comprehensive reproduction, the scripts are in order from 1 to 7. It just loops for for three replicates (the original work has 100 replicates). You can extend the loops throughout the script for your aimed number. The whole pipeline considering three replicates took approximately 25 hours to run using a Intel® Core™ i7-8650U CPU @ 1.90GHz × 8 with 16 GB RAM. It follows a short description of the content and steps.
PedigreeSimInput
folder with the necessary parameters for PedigreeSim software. The files starting with 0 are common for all the ploidies:
0.chrom
: general genomic information (number of chromosomes, length, centromere position)0.map
: genetic map with marker names and position0.ped
: the pedigree used in this studyX.gen
: we are not using this information during the analysis, you can ignore it, but it is necessary for PedigreeSim to run.X
is the ploidyX.par
: parameters for the simulationX_natural.par
: parameters for the simulation considering natural pairing
PedigreeSim
folder with the PedigreeSim software V2.0
PolyRelatedness_1.8
folder with the PolyRelatedness software V1.8
StatResults_Example.csv
output example with the summary statistics computed for each methodology for each population and replicate.
.sh
are bash scripts and .R
are R scripts
1_SimulateGenotypes.sh
: loops the PedigreeSim software through all the parameters files N times (here, N=3)2_funderalleles2biallelic.R
: reads PedigreeSim output and sample biallelic markers (0 or 1) to all the loci. It saves the data as a matrix in Rdata format.3_funderalleles2multiallelic.R
: reads PedigreeSim output and sample biallelic and multiallelic markers to all loci. It saves the data as a .txt following PolyRelatedness V1.8 format4_ObservedRelatedness.R
: computes the observed relationship matrix on the original data following delta computation (Equation 2 from the paper)5_PolyRelatednes.sh
: computes the relationship matrix considering the multiallelic methods (LO, RI, WE, MM, ML) for all the files from 3 using PolyRelatedness V1.86_AGHmatrix.R
: computes the relationship matrix considering the biallelic methods (VR, PD, FA) for all the files from 2 using AGHmatrix V2.07_ProcessingResults.R
: computes the metrics between observed and estimated relationships and save them in a table format.