GitHub - swyder/Reanalysis_plant_imprinting: Method to Analyze Genomic Imprinting Studies Using Generalized Linear Models

Scripts and data accompanying Wyder et al 2017:
Consistent Reanalysis of Genome-wide Imprinting Studies in Plants Using Generalized Linear Models Increases Concordance across Datasets bioRxiv https://doi.org/10.1101/180745

An edgeR-based analysis method to identify imprinted genes
requires RNA-seq data for reciprocal F1 crosses
takes into account biological replicates and count overdispersion
outperforms commonly used count statistics (like Fisher's exact or Chi-Square tests)
works for any tissue (including triploid endosperm)

Data folders


RAW_COUNTS	maternal/paternal raw counts of examined datasets
Informative_SNP_Positions	helper files for Classify_Alleles.py

Classify_Alleles.py

This script classifies mRNA-seq reads by strain. It takes a sorted BAM file, and a file containing positions of interest (SNPs where parental lines are homozygous for a different variant).
Requires pysam (tested with pysam v0.8.4). Installation of pysam is easiest with bioconda especially on Mac OS X (see https://github.com/pysam-developers/pysam).

Example:

python Classify_Alleles.py example_sorted.bam example_Pos_Of_Interest > Counts_Alleles_SRRxxxxxxx

Input: example_Pos_Of_Interest

A text file with six columns (tab delimited) describing position of interest annotated with overlapping GeneID. For reproducibility, this file should be sorted by chromosome and position (sort -k1,1 -k2,2nr FILE)

Chr1 37387 37388 G T AT1G01060
...

chromosome
start
end
reference allele
non-reference allele
GeneID overlapping this position

Annotation with GeneID can be done with bedtools intersect. Files ready for analysis are available in folder Informative_SNP_Positions (Arabidopsis Ler-Col and Maize B73-Mo17).

Output:

Gene MaternalReads PaternalReads AT1G06190 34 55 ... A text file with three columns

GeneID
reference read counts
non-reference read counts

NOTE: PCR duplicates are not excluded but can be removed before using samtools rmdup.

run_edgeR_LerCol_Pignatta.R

This script runs GLM analysis based on edgeR to identify statistically significantly imprinted genes. Requires RNA-seq of reciprocal F1 cross samples (at least 1 per reciprocal cross, better 2-3 samples per reciprocal cross).

Assumes that allelic count tables are located in the same directory with file names "Counts_Alleles_SRRxxxxxxx" where xxxxxxx is the SRR sample ID.

Example for ColxLer and LerxCol samples from Pignatta et al. (2014). Not fully generalized, some things are hard-coded.

Example:

Rscript run_edgeR_LerCol_Pignatta.R

NOTE: Make sure the working directory only contains count files from reciprocal crosses of 2 strains (e.g ColxLer or LerxCol). This script uses all files whose name start with 'Counts_Alleles_'

Authors

Stefan Wyder

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Informative_SNP_Positions		Informative_SNP_Positions
RAW_COUNTS		RAW_COUNTS
Classify_Alleles.py		Classify_Alleles.py
LICENSE		LICENSE
README.md		README.md
run_edgeR_LerCol_Pignatta.R		run_edgeR_LerCol_Pignatta.R
run_edgeR_LerCol_Pignatta.Rmd		run_edgeR_LerCol_Pignatta.Rmd
run_edgeR_LerCol_Pignatta.pdf		run_edgeR_LerCol_Pignatta.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Informative_SNP_Positions

Informative_SNP_Positions

RAW_COUNTS

RAW_COUNTS

Classify_Alleles.py

Classify_Alleles.py

LICENSE

LICENSE

README.md

README.md

run_edgeR_LerCol_Pignatta.R

run_edgeR_LerCol_Pignatta.R

run_edgeR_LerCol_Pignatta.Rmd

run_edgeR_LerCol_Pignatta.Rmd

run_edgeR_LerCol_Pignatta.pdf

run_edgeR_LerCol_Pignatta.pdf

Repository files navigation

Data folders

Classify_Alleles.py

Example:

Input: example_Pos_Of_Interest

Output:

run_edgeR_LerCol_Pignatta.R

Example:

Authors

About

Releases

Packages

Languages

License

swyder/Reanalysis_plant_imprinting

Folders and files

Latest commit

History

Repository files navigation

Data folders

Classify_Alleles.py

Example:

Input: example_Pos_Of_Interest

Output:

run_edgeR_LerCol_Pignatta.R

Example:

Authors

About

Resources

License

Stars

Watchers

Forks

Languages