Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phyloseq heat tree doesn't see taxon_id or supertaxon_id when trying filter_taxa? #359

Open
hank00000 opened this issue Nov 4, 2023 · 9 comments

Comments

@hank00000
Copy link

hank00000 commented Nov 4, 2023

Apologies if this has been discussed before or is an easy fix!

I've just been introduced to metacoder and I want to be able to use a phyloseq object as an entry. I can get the code to work pretty well, but as mentioned in the documentation, a pretty large and hard to read heat tree is produced. One suggestion is to filter_taxa by a taxon rank to cut it down a bit, but my object (sourced from standard physique object) won't work to filter by supertaxon. It throws the following error:

Warning messages:
1: There is no "taxon_id" column in the data set "3", so there are no taxon IDs.
2: The data set "4" is named, but not named by taxon ids.
3: In heat_tree.default(taxon_id = character(0), supertaxon_id = character(0), :
'taxon_id' and 'supertaxon_id' are empty. Returning NULL.

Data set 3 is my sample data so it makes sense that there is no taxon id. Data set 4 is the physeq tree that comes with the physique object.

How can I tell the filter_taxa function of the heat map to look in Data set 1 (out_table) or Data set 2 (taxa_data) for this? I know it's because I'm inputting a physeq, but it's much easier for me to do so in my workflow.

As is, I only have it as one line in the heat_tree code as so:

obj %>% + filter_taxa(taxon_ranks == "p") %>% + heat_tree(node_label = taxon_names, + node_size = n_obs, + node_color = n_obs, + initial_layout = "re", layout = "da")

Thank you!!

@hank00000 hank00000 changed the title Phyloseq obj makes heat tree ok, but doesn't see taxon_id or supertaxon_id? Phyloseq heat tree doesn't see taxon_id or supertaxon_id when trying filter_taxa? Nov 4, 2023
@zachary-foster
Copy link
Contributor

These two warngings are to be expected for the reasons you say:

Warning messages:
1: There is no "taxon_id" column in the data set "3", so there are no taxon IDs. 
2: The data set "4" is named, but not named by taxon ids. 

Just a result of trying to force all the data in the phyloseq object in the taxmap object.

I think the issue might be that ranks are not named "p", by maybe something like "Phylum" in your data set? What does taxon_ranks(obj) return? If you used the wrong rank format and caused filter to remove all taxa as a result, you can get that error:

3: In heat_tree.default(taxon_id = character(0), supertaxon_id = character(0),  :
  'taxon_id' and 'supertaxon_id' are empty. Returning NULL.

I really should add a better error message for this case.

Take a look at obj %>% filter_taxa(taxon_ranks == "p") and see if any taxa are left.

@hank00000
Copy link
Author

Yes, there are "no taxa" according to the code you suggested, but these rank names as I used qiime2 to assign taxonomy (but then input the files into R to make graphs with):

d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Comamonadaceae;g__Rhodoferax

However, I'm able to get heat trees made using the code as long as I don't ask it to filter taxa. So that's what's confusing me, I expected it to either not work or fully work, not half-work, haha. I'll attach a pic of what I was able to get from this dataset.
341f_primerset.pdf

@zachary-foster
Copy link
Contributor

I am guessing that there is no taxon rank information in the taxmap object. If you don't see rank names in taxon_ranks(obj) then that information is not available to be filtered by. Try parsing the phyloseq object with this code:

parse_phyloseq(my_phyloseq_obj, class_regex = "(^[a-z]+)__(.+$)", class_key = c("taxon_rank", "taxon_name"))

This, or something like it, should make the rank information available and then the filter should work.

@hank00000
Copy link
Author

Thanks for your help Zachary

When I do that, I get this error:

Error in parse_tax_data(tax_data = tax_data, datasets = datasets, class_cols = tax_cols, :
"named_by_rank = TRUE" does not make sense when "taxon_rank" is in "class_key".

@zachary-foster
Copy link
Contributor

What does taxon_ranks(obj) return with your original code?

@hank00000
Copy link
Author

hank00000 commented Nov 6, 2023

head(taxon_ranks(obj))
aab aac aae aaf aag aah
"Kingdom" "Kingdom" "Phylum" "Phylum" "Phylum" "Phylum"
It goes all the way to genus which matches my data set.

When I run it with the standard settings for parse_phyloseq

obj <- parse_phyloseq(phyR, class_regex = "(.*)", class_key = "taxon_name")

@zachary-foster
Copy link
Contributor

Ok, so it should be parsing the rank information correctly. What does this return?

obj %>% filter_taxa(taxon_ranks == "Phylum")

@hank00000
Copy link
Author

hank00000 commented Nov 6, 2023

You got it!! I was filtering based on 'p' because that's what the original data had, but if I filter by "Phylum" it works. Thank you so much for your help.

PS: The same errors are thrown about Dataset 3/Dataset 4, but the plot does filter by Phylum.
Warning messages:
1: There is no "taxon_id" column in the data set "3", so there are no taxon IDs.
2: The data set "4" is named, but not named by taxon ids.

obj <- parse_phyloseq(phyR, class_regex = "(.*)", class_key = "taxon_name")
set.seed(2) # Each number will produce a slightly different result for some layouts
obj %>%
filter_taxa(taxon_ranks == "Phylum") %>% # subset to the class rank
heat_tree(node_label = taxon_names,
node_size = n_obs,
node_color = n_obs,
initial_layout = "re", layout = "da")
#output_file = "plot_example.pdf")

@zachary-foster
Copy link
Contributor

No problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants