Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export p-values with heircachy amd taxanomic names? #330

Open
catherineel opened this issue Jan 20, 2022 · 2 comments
Open

Export p-values with heircachy amd taxanomic names? #330

catherineel opened this issue Jan 20, 2022 · 2 comments

Comments

@catherineel
Copy link

Hi there!

I was wondering, is there a way to export the stats generated from the heat trees?
Based on your tutorial we can get the p-values for the heat tree using:

obj$data$diff_table$wilcox_p_value <- p.adjust(obj$data$diff_table$wilcox_p_value,
                                               method = "fdr")

obj$data$diff_table <- compare_groups(obj, dataset = "tax_abund",
                                      cols = hmp_samples$sample_id, # What columns of sample data to use
                                      groups = hmp_samples$body_site) # What category each sample is assigned to
print(obj$data$diff_table)

When I tried:

write.xlsx(obj$data$diff_table, "obj$data$diff_table_padj.xlsx")

I get the taxon_id as 3 letter codes and I can't tell which taxa they belong to and their names.
Examples:
aab
aac
aad
aae
aaf
aag
aah
aai
aaj

Is there a way to generate and export an excel file that has the taxonomic name with its p-values?

@tboonf
Copy link

tboonf commented Apr 6, 2023

Has anyone figured out how to do this yet?

@zachary-foster
Copy link
Contributor

The taxon names can be accessed using the taxon_names() function. In functions that expect column names, like get_data_frame, functions like taxon_names() can be referenced and their output treated like a variable. For example:

library(metacoder)
library(dplyr)

x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

# Convert counts to proportions
x$data$otu_table <- calc_obs_props(x, data = "tax_data", cols = hmp_samples$sample_id)
#> Calculating proportions from counts for 50 columns for 1000 observations.

# Get per-taxon counts
x$data$tax_table <- calc_taxon_abund(x, data = "otu_table", cols = hmp_samples$sample_id)
#> Summing per-taxon counts from 50 columns for 174 taxa

# Calculate difference between groups
x$data$diff_table <- compare_groups(x, data = "tax_table",
                                    cols = hmp_samples$sample_id,
                                    groups = hmp_samples$body_site)

# Make table of per-taxon data
per_tax_data <- get_data_frame(x, name = c("taxon_ids", "taxon_names", "taxon_ranks", "classifications"))

# Combine with per-comparison data
comp_data <- left_join(x$data$diff_table, per_tax_data, by = c("taxon_id" = "taxon_ids"))

print(comp_data)
#> # A tibble: 1,740 × 10
#>    taxon_id treatme…¹ treat…² log2_…³ media…⁴ mean_d…⁵ wilcox_…⁶ taxon…⁷ taxon…⁸
#>    <chr>    <chr>     <chr>     <dbl>   <dbl>    <dbl>     <dbl> <chr>   <chr>  
#>  1 ab       Nose      Saliva     0     0       0       NaN       Root    r      
#>  2 ac       Nose      Saliva    -2.52 -0.170  -1.27e-1   6.84e-3 Proteo… p      
#>  3 ad       Nose      Saliva    -5.42 -0.267  -2.62e-1   1.08e-5 Bacter… p      
#>  4 ae       Nose      Saliva     5.12  0.597   5.81e-1   1.08e-5 Actino… p      
#>  5 af       Nose      Saliva    -1.01 -0.225  -1.47e-1   5.24e-2 Firmic… p      
#>  6 ag       Nose      Saliva  -Inf    -0.0249 -4.48e-2   1.49e-4 Fusoba… p      
#>  7 ah       Nose      Saliva     0     0      -1.02e-4   7.79e-2 Teneri… p      
#>  8 ai       Nose      Saliva     0     0       0       NaN       Spiroc… p      
#>  9 aj       Nose      Saliva    -2.65 -0.0954 -7.73e-2   5.20e-3 Gammap… c      
#> 10 ak       Nose      Saliva    -6.41 -0.0192 -1.96e-2   1.26e-3 Flavob… c      
#> # … with 1,730 more rows, 1 more variable: classifications <chr>, and
#> #   abbreviated variable names ¹​treatment_1, ²​treatment_2, ³​log2_median_ratio,
#> #   ⁴​median_diff, ⁵​mean_diff, ⁶​wilcox_p_value, ⁷​taxon_names, ⁸​taxon_ranks

Created on 2023-04-06 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants