Skip to content

An R package to identify and classify duplicated genes from whole-genome protein sequence data

Notifications You must be signed in to change notification settings

almeidasilvaf/doubletrouble

Repository files navigation

doubletrouble

GitHub issues Lifecycle: stable R-CMD-check-bioc Codecov test coverage

The major goal of doubletrouble is to identify duplicated genes from whole-genome protein sequences and classify them based on their modes of duplication. Duplicates can be classified using four different classification schemes, which increase the complexity and level of details in a stepwise manner. The classification schemes and the duplication modes they can classify are:

Scheme Duplication modes
binary SD, SSD
standard SD, TD, PD, DD
extended SD, TD, PD, TRD, DD
full SD, TD, PD, rTRD, dTRD, DD

Legend: SD, segmental duplication. SSD, small-scale duplication. TD, tandem duplication. PD, proximal duplication. TRD, transposon-derived duplication. rTRD, retrotransposon-derived duplication. dTRD, DNA transposon-derived duplication. DD, dispersed duplication.

Besides classifying gene pairs, users can also classify genes, so that each gene is assigned to a unique mode of duplication.

Users can also calculate substitution rates per substitution site (i.e., $K_a$, $K_s$ and their ratios $\frac{K_a}{K_s}$) from duplicate pairs, find peaks in Ks distributions with Gaussian Mixture Models (GMMs), and classify gene pairs into age groups based on Ks peaks.

Installation instructions

Get the latest stable R release from CRAN. Then install doubletrouble from Bioconductor using the following code:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("doubletrouble")

And the development version from GitHub with:

BiocManager::install("almeidasilvaf/doubletrouble")

Citation

Below is the citation output from using citation('doubletrouble') in R. Please run this yourself to check for any updates on how to cite doubletrouble.

print(citation('doubletrouble'), bibtex = TRUE)
#> To cite package 'doubletrouble' in publications use:
#> 
#>   Almeida-Silva F, Van de Peer Y (2022). _doubletrouble: Identification
#>   and classification of duplicated genes_. R package version 1.3.0,
#>   <https://github.com/almeidasilvaf/doubletrouble>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {doubletrouble: Identification and classification of duplicated genes},
#>     author = {Fabrício Almeida-Silva and Yves {Van de Peer}},
#>     year = {2022},
#>     note = {R package version 1.3.0},
#>     url = {https://github.com/almeidasilvaf/doubletrouble},
#>   }

Please note that the doubletrouble was only made possible thanks to many other R and bioinformatics software authors, which are cited either in the vignettes and/or the paper(s) describing this package.

Code of Conduct

Please note that the doubletrouble project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Development tools

For more details, check the dev directory.

This package was developed using biocthis.

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •