Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The taxon list table for the differential heat tree #317

Open
biosciences opened this issue Aug 23, 2021 · 6 comments
Open

The taxon list table for the differential heat tree #317

biosciences opened this issue Aug 23, 2021 · 6 comments

Comments

@biosciences
Copy link

Regarding the differential heat tree, how I can print the taxon list table for the differential heat tree after Wilcoxon rank Sum test? The taxon list should include the more abundant taxons for both comparative groups.

@zachary-foster
Copy link
Contributor

That information is in the table returned by compare_groups or calc_diff_abund_deseq2. You can subset it to just the significant results and add other information to it as needed. For example:

library(metacoder)
library(dplyr)

# Parse data for plotting
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

# Convert counts to proportions
x$data$otu_table <- calc_obs_props(x, data = "tax_data", cols = hmp_samples$sample_id)
#> Calculating proportions from counts for 50 columns for 1000 observations.

# Get per-taxon counts
x$data$tax_table <- calc_taxon_abund(x, data = "otu_table", cols = hmp_samples$sample_id)
#> Summing per-taxon counts from 50 columns for 174 taxa

# Calculate difference between groups
x$data$diff_table <- compare_groups(x, data = "tax_table",
                                    cols = hmp_samples$sample_id,
                                    groups = hmp_samples$body_site)

# Subset the table for p-value
signif_taxon_diff <- filter(x$data$diff_table, p.adjust(wilcox_p_value) < 0.05)

# Add taxon name
signif_taxon_diff$taxon_name <- taxon_names(x)[signif_taxon_diff$taxon_id]
signif_taxon_diff
#> # A tibble: 45 x 8
#>    taxon_id treatment_1 treatment_2 log2_median_ratio median_diff mean_diff
#>    <chr>    <chr>       <chr>                   <dbl>       <dbl>     <dbl>
#>  1 ad       Nose        Saliva                  -5.42      -0.267    -0.262
#>  2 ae       Nose        Saliva                   5.12       0.597     0.581
#>  3 al       Nose        Saliva                  -5.68      -0.248    -0.243
#>  4 am       Nose        Saliva                   5.12       0.597     0.581
#>  5 ap       Nose        Saliva                  -5.82      -0.173    -0.196
#>  6 ay       Nose        Saliva                  -5.68      -0.248    -0.243
#>  7 az       Nose        Saliva                   5.13       0.597     0.583
#>  8 bc       Nose        Saliva                  -5.82      -0.173    -0.196
#>  9 bf       Nose        Saliva                  -3.26      -0.154    -0.210
#> 10 bx       Nose        Saliva                  -6.86      -0.170    -0.186
#> # … with 35 more rows, and 2 more variables: wilcox_p_value <dbl>,
#> #   taxon_name <chr>

Created on 2021-08-23 by the reprex package (v2.0.0)

@biosciences
Copy link
Author

Thank you very much for your help! Your script is very useful.

@zachary-foster
Copy link
Contributor

No problem, glad it helped!

@DanielSoutoV
Copy link

Hola Zachary,

I am trying to do something similar here - I am actually just trying to see which taxa overlap in both treatments so I figured I'd only use the median_diff =0 instead of wilcox_p_value column (? or is this complete nonsense?!). I tried running this code in my own data, but I had some errors.

I thought of trying out the one here, but I still get an error that the wilcox_p_value column (?) is not found:

signif_taxon_diff <- filter(x$data$diff_table, p.adjust(wilcox_p_value) < 0.05)
Error in p.adjust(wilcox_p_value) : object 'wilcox_p_value' not found

I tried changing the code thinking its a matter of specifying which column, but I still get an error (although a different one).

signif_taxon_diff <- filter(x$data$diff_table, p.adjust(x$data$diff_table$wilcox_p_value) < 0.05)
Error in filter(x$data$diff_table, p.adjust(x$data$diff_table$wilcox_p_value) < :
missing values in 'filter'

When I try this with my own data, using the code below, I do get some 'signif_taxon_diff' except its now a time series table populated with NA!

mediandiff<- filter(obj$data$diff_table, obj$data$diff_table$median_diff == 0)
mediandiff
Time Series:
Start = 1
End = 283
Frequency = 1
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
1 NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA
4 NA NA NA NA NA NA NA
5 NA NA NA NA NA NA NA
6 NA NA NA NA NA NA NA
7 NA NA NA NA NA NA NA
8 NA NA NA NA NA NA NA
9 NA NA NA NA NA NA NA
10 NA NA NA NA NA NA NA
.......
136 NA NA NA NA NA NA NA
137 NA NA NA NA NA NA NA
138 NA NA NA NA NA NA NA
139 NA NA NA NA NA NA NA
140 NA NA NA NA NA NA NA
141 NA NA NA NA NA NA NA
142 36521 268 268 NaN 3.5 -1.6754 58.81118
[ reached getOption("max.print") -- omitted 141 rows ]

Sorry if this is all a bit convoluted for what I attempt to do - maybe there is even an easier way to just show which taxa are present in both treatments, but any help will be very much appreciated!

@zachary-foster
Copy link
Contributor

Hola @DanielSoutoV,

There is probably an easier way to do that, if I understand your goal correctly. Is something like this what you want?

library(metacoder)
#> This is metacoder verison 0.3.5 (stable)
hmp_samples <- dplyr::ungroup(hmp_samples) # Needed until next package update

# Parse data from abundance matrix
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

# Calculate the taxon abundance for groups of columns (e.g. treatments)
x$data$taxon_abund <- calc_taxon_abund(x, "tax_data", cols = hmp_samples$sample_id, groups = hmp_samples$body_site)
#> Summing per-taxon counts from 50 columns in 5 groups for 174 taxa

# Check which taxa overlap
taxon_names(x)[x$data$taxon_abund$Nose > 0 & x$data$taxon_abund$Saliva > 0]
#>                      ab                      ac                      ad 
#>                  "Root"        "Proteobacteria"         "Bacteroidetes" 
#>                      ae                      af                      ag 
#>        "Actinobacteria"            "Firmicutes"          "Fusobacteria" 
#>                      aj                      ak                      al 
#>   "Gammaproteobacteria"         "Flavobacteria"           "Bacteroidia" 
#>                      am                      an                      ao 
#>        "Actinobacteria"    "Betaproteobacteria"               "Bacilli" 
#>                      ap                      aq                      as 
#>            "Clostridia"          "Fusobacteria" "Epsilonproteobacteria" 
#>                      aw                      ax                      ay 
#>        "Pasteurellales"      "Flavobacteriales"         "Bacteroidales" 
#>                      az                      ba                      bc 
#>       "Actinomycetales"          "Neisseriales"         "Clostridiales" 
#>                      bd                      be                      bf 
#>       "Burkholderiales"       "Fusobacteriales"       "Lactobacillales" 
#>                      bg                      bi                      bk 
#>            "Gemellales"     "Enterobacteriales"      "Coriobacteriales" 
#>                      bl                      bq                      br 
#>     "Campylobacterales"       "Pasteurellaceae"     "Flavobacteriaceae" 
#>                      bs                      bt                      bu 
#>    "Porphyromonadaceae"  "Propionibacteriaceae"         "Neisseriaceae" 
#>                      bw                      bx                      bz 
#>        "Bacteroidaceae"       "Veillonellaceae"      "Burkholderiaceae" 
#>                      ca                      cb                      cd 
#>    "Corynebacteriaceae"      "Actinomycetaceae"      "Fusobacteriaceae" 
#>                      ce                      cf                      cg 
#>      "Streptococcaceae"       "Lachnospiraceae"        "Prevotellaceae" 
#>                      ch                      ck                      cm 
#>           "Gemellaceae"    "Enterobacteriaceae"     "Coriobacteriaceae" 
#>                      cn                      cp                      cr 
#>       "Ruminococcaceae"     "Carnobacteriaceae" "Peptostreptococcaceae" 
#>                      cx                      dm                      dn 
#>        "Micrococcaceae"           "Haemophilus"        "Capnocytophaga" 
#>                      do                      dp                      dq 
#>         "Porphyromonas"     "Propionibacterium"             "Neisseria" 
#>                      ds                      dt                      dv 
#>           "Bacteroides"           "Selenomonas"             "Lautropia" 
#>                      dw                      dx                      dz 
#>       "Corynebacterium"           "Actinomyces"           "Veillonella" 
#>                      ea                      eb                      ed 
#>          "Leptotrichia"         "Streptococcus"            "Prevotella" 
#>                      ee                      ef                      eg 
#>               "Gemella"          "Oribacterium"        "Shuttleworthia" 
#>                      em                      es                      et 
#>         "Fusobacterium"          "Ruminococcus"        "Granulicatella" 
#>                      ey                      ez                      fb 
#>             "Dialister"    "Peptostreptococcus"       "Parabacteroides" 
#>                      fg                      fi                      fk 
#>             "Roseburia"           "Mitsuokella"             "Catonella" 
#>                      fm                      gi                      gl 
#>                "Rothia"            "Tannerella"           "Providencia"

Created on 2022-02-02 by the reprex package (v2.0.1)

@DanielSoutoV
Copy link

Excellent Zachary thanks a lot.
Yes, definitely much easier this way.
Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants