Skip to content

bodkan/admixr

Repository files navigation

admixr—interactive R interface for ADMIXTOOLS

R-CMD-check Coverage status Binder

What is admixr?

The admixr package provides a convenient R interface to ADMIXTOOLS, a widely used software package for calculating admixture statistics and testing population admixture hypotheses.

A typical ADMIXTOOLS workflow often involves a combination of sed/awk/shell scripting and manual editing to create different configuration files. These are then passed as command-line arguments to one of ADMIXTOOLS commands, and control how to run a particular analysis. The results of such computation are then usually redirected to another file, which needs to be parsed by the user to extract values of interest, often using command-line utilities again or by manual copy-pasting, and finally analysed in R, Excel or another program.

This workflow can be a little cumbersome, especially if one wants to explore many hypotheses involving different combinations of populations or data filtering strategies. Most importantly, it makes it difficult to follow the rules of best practice for reproducible science, especially given the need for manual intervention on the command-line or custom shell scripting to orchestrate more complex pipelines.

admixr makes it possible to perform all stages of an ADMIXTOOLS analysis entirely from R. It provides a set of convenient functions that completely remove the need for "low-level" configuration of individual ADMIXTOOLS programs, allowing users to focus on the analysis itself.

How to cite

admixr is now published as an Application Note in the journal Bioinformatics. If you use it in your work, please cite the paper! You will join an excellent company of papers who have used it to do amazing research. 🙂

Installation instructions

Browser-based RStudio session

You can try out admixr without installation directly in your browser! Simply click on Binder and after a short moment you will get a Binder RStudio could session running in your web browser. However, please note that Binder's computational resources are extremely limited so you might run into issues if you try to run extremely resource-intensive computations.

Latest stable version

The package is available on CRAN. You can install it simply by running

install.packages("admixr")

from your R session. This the recommended procedure for most users.

Development version

To install the development version from Github (which might be slightly ahead in terms of new features and bugfixes compared to the stable release on CRAN), you need the R package devtools. You can run:

install.packages("devtools")
devtools::install_github("bodkan/admixr")

Installing ADMIXTOOLS

In order to use the admixr package, you need a working installation of ADMIXTOOLS. You can find installation instructions here.

Furthermore, you also need to make sure that R can find ADMIXTOOLS binaries on the $PATH. You can achieve this by specifying PATH=<path to the location of ADMIXTOOLS programs> in the .Renviron file in your home directory. If R cannot find ADMIXTOOLS utilities, you will get a warning upon loading library(admixr) in your R session.

Example analysis

This is all the code that you need to perform ADMIXTOOLS analyses using this package! No shell scripting, no copy-pasting and manual editing of text files. The only thing you need is a working ADMIXTOOLS installation and a path to EIGENSTRAT data (a trio of ind/snp/geno files), which we call prefix here.

library(admixr)

# download a small testing dataset to a temporary directory and process it for use in R
snp_data <- eigenstrat(download_data())

result <- d(
  W = c("French", "Sardinian"), X = "Yoruba", Y = "Vindija", Z = "Chimp",
  data = snp_data
)

result

Note that a single call to the d function generates all required intermediate config and population files, runs ADMIXTOOLS, parses its log output and returns the result as a data.frame object with the D statistics results. It does all of this behind the scenes, without the user having to deal with low-level technical details.

Is admixr related to ADMIXTOOLS 2?

Recently, a new R package called ADMIXTOOLS 2 appeared on the horizon, offering a re-implementation of several features of the original ADMIXTOOLS suite of command-line programs.

The admixr project is not related to that initiative at all. It is not a pre-cursor to it, nor it is—the way I see it—superseeded by it. I have never used ADMIXTOOLS 2 myself, but from the looks of it it seems to offer some very interesting features for fitting complex admixture graphs (something I'm not personally interested in, which is why early efforts to implement this in admixr have been eventually given up on).

The bottom-line is this: as long as the original ADMIXTOOLS continues to be developed and maintained, admixr remains relevant and useful and will continue to be supported. ADMIXTOOLS might have a smaller set of features than ADMIXTOOLS 2, but the features it provides are extremely stable. ADMIXTOOLS is one of the most battle-tested pieces of software in population genetics—if you're happy with the set of features it provides and if you're happy with admixr itself, there is no dramatic reason to move away from either of them.

More information

To see many more examples of admixr in action, please check out the tutorial vignette.

If you want to stay updated on new admixr development, follow me on Twitter.