Skip to content

Commit

Permalink
Merge pull request #2 from d2389758/patch-3
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
danangcrysnanto committed Dec 19, 2019
2 parents 53948e5 + c7d8eb3 commit 2e9f4c8
Showing 1 changed file with 4 additions and 13 deletions.
17 changes: 4 additions & 13 deletions part3_consensusgenome/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Part 3 Comparison between consensus and genome graphs

In addition of inclusion genetic diversity in the graphs, ones could also mitigate mapping bias by adjusting reference genome to the targeted population, or so called as consensus genome approach. In this part, we replaced bases in the ARS-UCD 1.2 bovine reference genome with the most frequent allele in the the population. We consider two types of consensus:
In addition to including genetic diversity in the graphs, one could also mitigate mapping bias by adjusting reference genome to the targeted population, or so called as consensus genome approach. In this part, we replaced bases in the ARS-UCD 1.2 bovine reference genome with the most frequent allele in the the population. We consider two types of consensus:

#### 1. Major-BSW

Expand All @@ -10,8 +10,6 @@ We modified reference bases with major allele where frequency calculated based o

We modified reference bases with major allele where frequency calculated based combined 288 animals in four cattle populations (BSW, OBV, HOL, and FV).



![Consensus genome experiment](fig/methodpart3.png)


Expand Down Expand Up @@ -44,9 +42,9 @@ ___

#### 1. Creating consensus genome based on major allele

We calculated two consensus, `major-BSW` and `major-pan` where AF calculated based on Brown Swiss and combined population respectively. We provided the variants in `../data/part3/vcf_consensus` (with frequency file and the vcf files).
We calculated two consensus, `major-BSW` and `major-pan` where allele frequencies were calculated based on Brown Swiss and combined population, respectively. We provided the variants in `../data/part3/vcf_consensus` (with the frequency file and the vcf files).

We modified the original reference with major variants defined in the vcf file with `vcf2diploid` tools. *Vcf2diploid* is a tool to generate parental and maternal haplotypes by replacing reference with variants from phased vcf. For our purpose, we inputted a single sample vcf with all genotypes were homozygous alternate (thus all alleles will be replaced with corresponding variants and outputted the same two fasta haplotypes). Since replacing reference allele with insertion and deletions causing genomics coordinate shift, we applied the accompanying *chain file* produced by *vcf2diploid* to convert the coordinates of the simulated reads from the original to the modified reference using local UCSC liftOver tools.
We modified the original reference with major variants defined in the vcf file with `vcf2diploid` tools. *Vcf2diploid* is a tool to generate parental and maternal haplotypes by replacing reference with variants from phased vcf. For our purpose, we inputted a single sample vcf with all homozygous alternate genotypes (thus all alleles will be replaced with corresponding variants and outputted the same two fasta haplotypes). Since replacing reference allele with insertion and deletions would cause genomics coordinate shift, we applied the accompanying *chain file* produced by *vcf2diploid* to convert the coordinates of the simulated reads from the original to the modified reference using local UCSC liftOver tools.



Expand All @@ -62,17 +60,10 @@ Where consensus type is either `major-BSW` or `major-pan`.

The scripts will generate modified consensus genomes as `25_anims_major-BSW.fa` and `25_anims_major-pan.fa`. Additionally, mapping statistics generated in `compare.gz` files that are required for subsequent data analyses.



#### 3. Data Analysis

The analysis presented in the paper can be followed interactively through Jupyter notebook in [`analysis/part3_consensusgenome.ipynb`](analysis/part3_consensusgenome.ipynb) or via `binder` [![Binder](http://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/danangcrysnanto/bovine-graphs-mapping/master?filepath=part3_consensusgenome/analysis/part3_consensusgenome.ipynb).









Expand Down

0 comments on commit 2e9f4c8

Please sign in to comment.