GitHub

========================================================

CNVrd2: A package for measuring gene copy number, identifying SNPs tagging copy number variants, and detecting copy number polymorphic genomic regions

Install and use

Download the file

CNVrd2_1.9.1.tar.gz

Install the package

R CMD INSTALL CNVrd2_1.9.1.tar.gz

Please see the file CNVrd2.pdf

Window users can use the link of the Bioconductor Project:

http://www.bioconductor.org/packages/devel/bioc/html/CNVrd2.html

####To detect CNVRs, please use the function SRBreak in:

https://github.com/hoangtn/SRBreak/blob/master/TestSRBreak.ipynb

Notes: using the 1000 Genomes data

Please read information below or see the file using1000Genome

Please go to QuestionsAndAnswers to take a quick look at asked questions about the package.

USING THE 1000 GENOMES DATA

This note describes some simple steps for using the data from the 1000 Genomes Project http://www.1000genomes.org/.

Bam files

Download an index file

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/alignment_indices/20130502.low_coverage.alignment.index

Obtain a list of bam files (the first column)

cat 20130502.low_coverage.alignment.index |awk '{print $1}'|grep '\.mapped' > listbam.txt

There are 2535 bam files on the page ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/. Therefore, we can make a list of these bam files and their full links.

awk '{print "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/"$0}' < listbam.txt > listbamAndFullLinks.txt

We can choose a population (or multiple populations) to find tagSNPs. For example, here we choose the Mexican Ancestry in Los Angeles (MXL) population and find tagSNPs for FCGR3B gene.

cat listbamAndFullLinks.txt|grep "MXL" > listMXL.txt

The gene is at chr1:161592986-161601753 http://www.ncbi.nlm.nih.gov/gene/2215, so we will use samtools (Li et al. 2009 ) http://samtools.sourceforge.net/ to download a 1Mb region around the gene: chr1:161100000-162100000.

while read line
do
tempName=$(echo $line|awk -F"/" '{print $NF}')
samtools view -hb $line 1:161100000-162100000 > $tempName
done < listMXL.txt

After downloading, we can use samtools to keep only reads mapped:

for file in $(ls *bam)
do

#####Index bam file
samtools index $file

#####Keep only read mapped
sammtols view -F 4 $file -b > temp.bam

mv temp.bam $file
done

A good website to understand SAM/BAM flags is http://broadinstitute.github.io/picard/explain-flags.html

VCF files

We can use samtools to obtain a vcf file (Danecek et al. 2011 ) http://vcftools.sourceforge.net/ for all MXL samples above or we can use the SNP data of the 1000 Genomes Project.

Here, we use the data from the 1000 Genomes Project.

All VCF files can be obtained from this page http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/

We choose a region, for example chr1:161400000-161700000 flanking the FCGR3 gene to identify tagSNPs.

Download and index the vcf file by using "tabix" and "bgzip" command lines in the tabix tool (Li, 2011 ) http://samtools.sourceforge.net/tabix.shtml:

tabix -h http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/ALL.chr1.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.vcf.gz 1:161400000-161700000  > chr1.161400000.161700000.vcf

Index the file:

bgzip chr1.161400000.161700000.vcf -c > chr1.161400000.161700000.vcf.gz

tabix -p vcf chr1.161400000.161700000.vcf.gz

References

P. Danecek, A. Auton, G. Abecasis, C.A. Albers, E. Banks, M.A. DePristo, R.E. Handsaker, G. Lunter, G.T. Marth, S.T. Sherry, others, (2011) The variant call format and VCFtools. Bioinformatics 27 (15) 2156-2158
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, others, (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25 (16) 2078-2079
Heng Li, (2011) Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27 (5) 718-719

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
CNVrd2		CNVrd2
CNVrd2.pdf		CNVrd2.pdf
CNVrd2_1.9.1.tar.gz		CNVrd2_1.9.1.tar.gz
QuestionsAndAnswers.md		QuestionsAndAnswers.md
README.html		README.html
README.md		README.md
using1000Genome.html		using1000Genome.html
using1000Genome.md		using1000Genome.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNVrd2

CNVrd2

CNVrd2.pdf

CNVrd2.pdf

CNVrd2_1.9.1.tar.gz

CNVrd2_1.9.1.tar.gz

QuestionsAndAnswers.md

QuestionsAndAnswers.md

README.html

README.html

README.md

README.md

using1000Genome.html

using1000Genome.html

using1000Genome.md

using1000Genome.md

Repository files navigation

CNVrd2: A package for measuring gene copy number, identifying SNPs tagging copy number variants, and detecting copy number polymorphic genomic regions

Install and use

Notes: using the 1000 Genomes data

USING THE 1000 GENOMES DATA

Bam files

VCF files

References

About

Releases

Packages

Contributors 3

Languages

hoangtn/CNVrd2

Folders and files

Latest commit

History

Repository files navigation

CNVrd2: A package for measuring gene copy number, identifying SNPs tagging copy number variants, and detecting copy number polymorphic genomic regions

Install and use

Notes: using the 1000 Genomes data

USING THE 1000 GENOMES DATA

Bam files

VCF files

References

About

Resources

Stars

Watchers

Forks

Languages