Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relative abundance in heat tree #357

Open
SimonMorvan opened this issue Oct 25, 2023 · 4 comments
Open

Relative abundance in heat tree #357

SimonMorvan opened this issue Oct 25, 2023 · 4 comments

Comments

@SimonMorvan
Copy link

Hello Metacoder devs,

It must be pretty straight forward but I'm having a hard time trying to plot relative abundance of the taxa in the whole dataset instead of OTU count as node_size in the heat tree. I guess I have to change n_obs to something but I couldn't find the right parameter to replace it by.

Have a good day,

Simon

library(phyloseq)
library(metacoder)

data(GlobalPatterns)
# Subsetting the dataset to keep only 2 sample types
GP_sub <- subset_samples(GlobalPatterns, (SampleType=='Ocean' | SampleType=='Soil'))
GP_sub <- prune_taxa(taxa_sums(GP_sub)>0, GP_sub)

# Agglomerating the dataset to Class 
GP_sub_class_glom <- tax_glom(GP_sub,taxrank="Class",NArm = F)
  


meta_obj <- parse_phyloseq(GP_sub_class_glom) 

meta_obj$data$otu_relab <- calc_obs_props(meta_obj, "otu_table")  

meta_obj$data$tax_relab <- calc_taxon_abund(meta_obj, "otu_relab") 

meta_obj$data$diff_table <- compare_groups(meta_obj, data = "tax_relab",
                                           cols = meta_obj$data$sample_data$sample_id,
                                           groups = meta_obj$data$sample_data$SampleType)

heat_tree(meta_obj,
          node_size = n_obs , 
          node_label = taxon_names,
          node_color = log2_median_ratio, # A column from `obj$data$diff_table`
          node_color_range = diverging_palette(), # The built-in palette for diverging data                 
          node_color_axis_label = "Log2 ratio median proportions",
          repel_labels = TRUE,
          layout = "davidson-harel", # The primary layout algorithm
          initial_layout = "reingold-tilford") # The layout algorithm that initializes node locations
@zachary-foster
Copy link
Contributor

Hello,

You can put in the name of any column in any table in you input column in place of n_obs. This is why log2_median_ratio works. To use total taxon abundance, you need a column in a per-taxon table that has that. To make that, you can use calc_taxon_abund with a grouping variable that uses all samples, so that you only get one column back with a total. That column name can then be used like so:

library(metacoder)
#> This is metacoder verison 0.3.5 (stable)

# Get example data
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

# Geting a total for all columns 
x$data$tax_abund_total <- calc_taxon_abund(x, "tax_data", cols = hmp_samples$sample_id,
                                           groups = rep("total_count", nrow(hmp_samples)))
#> Summing per-taxon counts from 50 columns in 1 groups for 174 taxa


# Plot total count
heat_tree(x, node_label = taxon_names, node_size = total_count, node_color = total_count)

Created on 2023-10-25 with reprex v2.0.2

@SimonMorvan
Copy link
Author

Hello Zachary,

Thanks for your quick answer!
I've tried to adapt the code to my parsed_phyloseq object but I still can't get it to work.
When i run the calc_taxon_abund() function the result is a tible with 183 rows but then the compare_groups() function returns a diff_table with 147 rows. This results in an error message saying there are 36 of 183 taxa have NAs for the "node_color" option when i want to plot the heat_tree.

Simon

library(phyloseq)
library(metacoder)

data(GlobalPatterns)
# Subsetting the dataset to keep only 2 sample types
GP_sub <- subset_samples(GlobalPatterns, (SampleType=='Ocean' | SampleType=='Soil'))
GP_sub <- prune_taxa(taxa_sums(GP_sub)>0, GP_sub)

# Agglomerating the dataset to Class 
GP_sub_class_glom <- tax_glom(GP_sub,taxrank="Class",NArm = F)

meta_obj <- parse_phyloseq(GP_sub_class_glom) 

meta_obj$data$tax_ab <- calc_taxon_abund(meta_obj, "otu_table", cols = meta_obj$data$sample_data$sample_id,
                         groups = rep("total_count", nrow(meta_obj$data$sample_data)))

meta_obj$data$diff_table <- compare_groups(meta_obj, data = "otu_table",
                                           cols = meta_obj$data$sample_data$sample_id,
                                           groups = meta_obj$data$sample_data$SampleType)

heat_tree(meta_obj,
          node_size = total_count , 
          node_label = taxon_names,
          node_color = log2_median_ratio, # A column from `obj$data$diff_table`
          node_color_range = diverging_palette(), # The built-in palette for diverging data                 
          node_color_axis_label = "Log2 ratio median proportions",
          repel_labels = TRUE,
          layout = "davidson-harel", # The primary layout algorithm
          initial_layout = "reingold-tilford") # The layout algorithm that initializes node locations

@zachary-foster
Copy link
Contributor

zachary-foster commented Nov 2, 2023

Hard to say for sure without being able to run the code with your data myself, but it looks like you are using the OTU table in compare_groups instead of the taxon abundance table? This would compare OTU abundance amoung the groups, not the taxa, which causes attempts to plot data for each taxon to fail. Try this instead:

meta_obj$data$diff_table <- compare_groups(meta_obj, data = "tax_ab",
                                           cols = meta_obj$data$sample_data$sample_id,
                                           groups = meta_obj$data$sample_data$SampleType)

@SimonMorvan
Copy link
Author

Hi Zachary, sorry for the late answer.
The piece of code you sent did not work. :

meta_obj$data$tax_ab <- calc_taxon_abund(meta_obj, "otu_table", 
                                         cols = meta_obj$data$sample_data$sample_id,
                                         groups = rep("total_count", nrow(meta_obj$data$sample_data)))
#Summing per-taxon counts from 6 columns in 1 groups for 183 taxa

meta_obj$data$diff_table <- compare_groups(meta_obj, data = "tax_ab",
                                           cols = meta_obj$data$sample_data$sample_id,
                                           groups = meta_obj$data$sample_data$SampleType)
#Error : The following 6 column(s) are not in "tax_ab":
#CL3, CC1, SV1, NP2, NP3, NP5

You should be able to run the code I provided as GlobalPatterns is a dataset from the phyloseq package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants