Skip to content

Cristianetaniguti/Reads2Map

Repository files navigation

Development Reads2Map



Reads2Map is a collection of WDL workflows designed to facilitate the contruction of linkage maps from sequencing reads. You can find details of each workflow release on the Read2Map releases page, available here.

The main workflows are the EmpiricalReads2Map.wdl and the SimulatedReads2Map.wdl. The EmpiricalReads2Map.wdl is composed by the EmpiricalSNPCalling.wdl that performs the SNP calling, and the EmpiricalMaps.wdl that performs the genotype calling and map building in empirical reads. The SimulatedReads2Map.wdl used RADinitio software to simulate Illumina reads for RADseq, exome, or WGS data and performs the SNP and genotype calling and genetic map building.

The SNP calling step in Reads2Map currently includes the popular tools: GATK, Freebayes, TASSEL, and STACKs. For genotype/dosage calling, the workflow utilizes tools like updog, polyRAD, and SuperMASSA. Lastly, Reads2Map leverages OneMap, GUSMap, and MAPpoly for linkage map construction.

For diploid data, you can visualize the results using the R package and shiny app called Reads2MapApp, available here. This package supports the visualization of linkage maps built using OneMap and GUSMap.

The Reads2Map workflows perform the SNP and genotype/dosage calling for your complete data set. However, it builds the linkage map for only a single chromosome (reference genome is required) for each combination of software and parameters. The produced maps will probably still require improvements, but their characteristics will suggest which combination of SNP and genotype calling software and parameters you should use for your data. Once the pipeline is selected, you can input the respective VCF file in R and build the complete linkage map using OneMap or MAPpoly. Use OneMap or MAPoly tutorials for guidance on building and improving the linkage map for the complete dataset.

How to use

Multiple systems are available to run WDL workflows such as Cromwell, miniWDL, and dxWDL. See further information in the openwdl documentation.

In addition, we also suggest two wrappers: pumbaa and Caper. Here is a tutorial on how to setup these tools and one example running the EmpiricalReads2Map:

To run a pipeline, first navigate to Reads2Map releases page, search for the pipeline tag you which to run, and download the pipeline’s assets (the WDL workflow, the JSON, and the ZIP with accompanying dependencies).

Check the description of the inputs for the pipelines:

Check how to evaluate the workflows results in Reads2MapApp Shiny (so far only available for diploid datasets):

Check more information and examples of usage in:

Taniguti, C. H.; Taniguti, L. M.; Amadeu, R. R.; Lau, J.; de Siqueira Gesteira, G.; Oliveira, T. de P.; Ferreira, G. C.; Pereira, G. da S.; Byrne, D.; Mollinari, M.; Riera-Lizarazu, O.; Garcia, A. A. F. Developing best practices for genotyping-by-sequencing analysis in the construction of linkage maps. GigaScience, 12, giad092. https://doi.org/10.1093/gigascience/giad092

Third-party software and images

R packages

How to cite

Taniguti, C. H.; Taniguti, L. M.; Amadeu, R. R.; Lau, J.; de Siqueira Gesteira, G.; Oliveira, T. de P.; Ferreira, G. C.; Pereira, G. da S.; Byrne, D.; Mollinari, M.; Riera-Lizarazu, O.; Garcia, A. A. F. Developing best practices for genotyping-by-sequencing analysis in the construction of linkage maps. GigaScience, 12, giad092. https://doi.org/10.1093/gigascience/giad092

Funding

This work was partially supported by the National Council for Scientific and Technological Development (CNPq - 313269/2021-1); by USDA, National Institute of Food and Agriculture (NIFA), Specialty Crop Research Initiative (SCRI) project “Tools for Genomics Assisted Breeding in Polyploids: Development of a Community Resource” (Award No. 2020-51181-32156); and by the Bill and Melinda Gates Foundation (OPP1213329) project SweetGAINS.