Skip to content

Releases: ChaissonLab/danbing-tk

danbing-tk v1.3.2

14 Jun 18:33
ab954f0
Compare
Choose a tag to compare

Major changes:

  • Automated bias correction using danbing-tk-pred

Resources:

  • ikmer.meta required by danbing-tk-pred
  • ikmer.meta.txt human readable version of ikmer.meta.txt with format documented in Wiki
  • Example trkmers.meta.txt required by danbing-tk-pred

Next release (v1.3.3):

  • Automated dosage computation for motifs and TR loci

danbing-tk v1.3.1 (manuscript)

04 Mar 00:07
d286cd6
Compare
Choose a tag to compare

This version is associated with the manuscript: "The motif composition of variable-number tandem repeats impacts gene expression"

Major changes:

  • Updated preferred usage of danbing-tk by turning on kmer filter: -kf 4 1
  • Reduces *.tr.kmers output size by saving only counts, and uses index file to reconstruct locus name and kmer names

Resources in Assets:

  • tr.good.bed: VNTR set for building RPGG

Additional resource on Zenodo:

  • VNTR statistics and annotations on 35 HGSVC assemblies
  • RPGG built from the annotations
  • GTEx gene-level eVNTR discoveries
  • GTEx gene-level eMotif discoveries
  • GTEx fine-mapping results using susieR
  • Bias matrices for HGSVC, HPRC, GTEx, and Geuvadis samples used in bias correction
  • GTEx bias-corrected kmer dosage table
  • Geuvadis bias-corrected kmer dosage table

Additional analysis scripts for bias correction, eQTL mapping, and fine-mapping are available in this repo.

danbing-tk v1.3

25 Jun 18:42
689c2f7
Compare
Choose a tag to compare

Improvements:

  • Significantly improve the time/mem usage of danbing-tk
    • benchmark setting
      • 31x HG00731 SRS sample from 1000 Genomes Project
      • two-consortium RPGG, 81045 loci
      • 16 cores xeon-2665, avx
      • samtools fasta -@2 -n $bam | danbing-tk -a -kf 4 1 -gc 80 -k 21 -qs pan -fa /dev/stdin -o $out -p 16 -cth 45 | gzip >$aln
    • Sample was genotyped in ~43 min using 31.4 Gb mem
    • 24x speedup, 37% reduction in mem usage
    • Output file size: 1.3 Gb
  • danbing-tk now takes binary graph/index as input
    • ktools serialze was added to convert *kmers to *.graph.umap *.kmerDBi.umap and *.kmerDBi.vv
  • bam2pe is now merged with danbing-tk
    • use -fa option for non-interleaved fasta e.g. samtools fasta -@2 -n $bam
    • use -fai for interleaved fasta

Resources

  • New RPGG and VNTR coordinates on 35 HGSVC genomes are available at Zenodo

danbing-tk v1.2

14 Jun 21:16
Compare
Choose a tag to compare

Improvements:

  • Improved indel handling in graph threading.
  • Improved the memory scalability of multiple-boundary-alignment.

Resources:

  • New RPGG and VNTR coordinates on 35 HGSVC genomes are available at Zenodo

manuscript-1

13 May 17:19
Compare
Choose a tag to compare

Latest version of code and resources that associate with the manuscript "Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs". Released for creating DOI with Zenodo.

danbing-tk v1.1

26 Jan 02:40
Compare
Choose a tag to compare

Improvements:

  • Faster danbing-tk aliign: 2.6x speedup on HG00096 when genotying 32,138 loci
  • More flexible use of danbing-tk build: generating RPGG without SRS data by skipping graph pruning
  • More informative aln-r2: fixed zero r2 when no variation in assembly kmer count by adding a dummy point at (0,0)

danbing-tk v1.0

12 Jan 04:33
Compare
Choose a tag to compare

Improvements:

  • Improved length estimation accuracy using multi-boundary expansion, due to more accurate orthology mapping of VNTRs across haplotypes.
  • More stringent QC on VNTR size, number of supporting haplotypes, consistency of liftover coordinates, etc.
  • Slightly expand VNTR set from 29,111 to 32,138 loci.
  • Added more user-friendly length estimation script.
  • Added option for alignment output by using -a with danbing-tk align
  • DOI created using Zenodo

Additional resources:

  • Repeat-pangenome graph encoded as pan.tr.kmers, pan.ntr.kmers and pan.graph.kmers in RPGG.tar.gz
  • 84,411 raw VNTR coordinates tr.84411.bed
  • 32,138 raw VNTR coordinates (high-confidence genotypable set) tr.good.bed
  • 397 non-VNTR regions ctrl.bed
  • Locus-specific biases of VNTR and non-VNTR regions LSB.tsv
  • Summary of eGene discoveries Alltissue.egenes.tsv
  • Comprehensive VNTR statistics vntr.statistics.tsv vntr.statistics.README
  • 13 PacBio CLR assemblies (26 haplotypes) *.h?.fasta.gz
  • 32,138 boundary-expanded VNTR coordinates in the 26 haplotypes pan.tr.mbe.no_CCS.bed and pan.tr.mbe.no_CCS.README
  • 73,582 boundary-expanded VNTR coordinates pan.tr.73582.mbe.no_CCS.bed

Example analyses:

  • QC of multi-boundary expansion 202011.MultiBoundaryExpansion.QC.ipynb
  • Measuring length prediction accuracy 202012.Acc.pan.ipynb
  • Contrasting the most informative kmer between populations 202012.mikmer.ipynb
  • eQTL mapping 202012.eQTL.32138.ipynb
  • Sample QC on locus-specific bias LSB_analysis.ipynb
  • Heritability analysis of SNP v.s. SNP+VNTR models 202011.sg.joint.ipynb
  • Miscellaneous analyses in the original manuscript 202012.revision.supp.ipynb

v0.0

08 Aug 00:26
Compare
Choose a tag to compare

Version 0 of genotypable VNTRs, RPGG and precomputed LSB are out! These files should be the same as the ones used for the analysis in the original paper.