GitHub - bramstone/Selecting-associations-in-microbial-datasets: Implementation of Lallich et al.'s 2006 algorithm to reduce type I error between correlations in microbial datasets

Mining Significant Associations in Microbial Community Datasets

This function is an implementation of Lallich et al.'s 2006 algorithm to reduce type I error (the false discovery rate) by selecting a subset of correlations with values deemed suitable or interesting by the researcher (1). The function writes its output in a data frame format suitable for graphing with the igraph package. Currently, work is being done to incorporate Reiner et al.'s 2003 method for controlling false discovery (2), as well as the Benjamini and Yekutielli 2006 methodology (3), into the function.

Creating a network based off of biological or ecological interactions must contend with the fact that the features of interest (species or genes) are inherently interdependent with each other, meaning that more traditional methods of signinficance correction are not sufficient. Furthermore, microbial datasets present other difficulties, such as containing a large number of features, leading to many possible pairwise correlations and infrequent occurrence across samples or records, leading to features with non-normal abundance distributions. Creating bootstrapped null values to compare the existing data against is one solution to mine the dataset for features of interest.

To accomodate large datasets, this function breaks correlation matrices into 10,000 row blocks which are assessed against the initial threshold of significance. This is done mainly to ensure that R does not try to perform computations on prohibitively large datasets in memory. To accommodate the frequent non-linearity of microbial associations, it is recommended to assess microbial datasets using Spearman rank-based correlations. Other, more ecologically relevent association measures may be calculated using the vegan package, though they may not perform as well as Spearman or Pearson measures (4).

Besides the choice of association measure, users must decide how many false positive associations they are willing to accept in their data, as well as the risk (or probability, [0:1]) of the data containing more than the specified number of false positives. The default values are one false positive with a risk of 0.05 (or five percent) that more than one false positive will be in the dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LICENSE		LICENSE
README.md		README.md
Vignette.pdf		Vignette.pdf
bs_fdr.R		bs_fdr.R
example_abundance_data.csv		example_abundance_data.csv
example_sampling_data.csv		example_sampling_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

Vignette.pdf

Vignette.pdf

bs_fdr.R

bs_fdr.R

example_abundance_data.csv

example_abundance_data.csv

example_sampling_data.csv

example_sampling_data.csv

Repository files navigation

Mining Significant Associations in Microbial Community Datasets

References

About

Releases

Packages

Languages

License

bramstone/Selecting-associations-in-microbial-datasets

Folders and files

Latest commit

History

Repository files navigation

Mining Significant Associations in Microbial Community Datasets

References

About

Resources

License

Stars

Watchers

Forks

Languages