Skip to content

brandonlind/testdata_validation

Repository files navigation

DOI DOI

Testdata validation

Code used to validate allele frequency estimates from our poolSeq data by comparing estimates from the same individual sequence data, as well as to validate megaSNPs as likely regions where paralogs are likely misaligning and causing false positive SNPs in our data.


Usage

If you use or are inspired by code from this repo, please site related manuscripts and data:

Data

Lind et al. (in press) Haploid, diploid, and pooled exome capture recapitulate features of biology and paralogy in two non-model tree species. Accepted to Molecular Ecology Resources. Available on bioRxiv https://doi.org/10.1101/2020.10.07.329961


Repository structure

Below are the descriptions of notebooks in this repo. Notebooks can be viewed in the repository but are best viewed at https://nbviewer.jupyter.org (hyperlinks below). Notebooks 002 and 003 contain figures found in the main and supplemental texts.

Full repository

001_testdata_explore.ipynb

Explore the data, isolate the set of SNPs intersecting both (indSeq and pooLSeq) baseline-filtered datasets across both Doug-fir and Jack pine.

002_testdata_compare_AFs.ipynb

This notebook takes SNPs intesecting indSeq and poolSeq methods for Doug-fir and Jack pine from (001_testdata_explore.ipynb) and investigates filtering methods that will improve agreement between indSeq and poolSeq estimates.

003_testdata_validate_megaSNPs.ipynb

Validate sites that are called as heterozygote from haploid tissue as those potentially within a region subject to paralog misalignment.

004_misc_suppmat.ipynb

Calculate some numbers for some tables and the Supplemental Material

005_transfer_to_SRA.ipynb

Code to create SRA metadata, Biosample metadata, and to upload fastq files to NCBI Short Read Archive ftp with python


pythonimports in notebooks can be found here: https://github.com/brandonlind/pythonimports

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published