Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting incorrectly formated results with zero_low_counts #356

Open
dhadsell opened this issue Oct 24, 2023 · 4 comments
Open

Getting incorrectly formated results with zero_low_counts #356

dhadsell opened this issue Oct 24, 2023 · 4 comments

Comments

@dhadsell
Copy link

Hello,
I am trying to run Metacoder on some 16s data from water samples that was obtained through RTL genomics. I import 2 text files. The first contains the otu with 30 samples x 6,672 otu. the second is the meta data file with sample ids in the first column and trt groups in the second. I have attached samples of both files along with the script that I used. It seems that when I first use the "parse_tax_data" comand the output appears correctly formated. But when I attempt to run the object through the "zero_low_counts" command the resulting tax_data loses the otu_id and lineage columns and the remaining sample columns are converted from to . It turns out that the obj$data$tax_data[ , water_samples$sample_id] == 0 on the full data set it generates a large matrix whereas in your example analysis with human data the object is a "logi" object. If I take only the first 100 lines of the otu data then a logi object is generated but the below error still happens
when I try to run the rowSums comand on obj$data$tax_data[ , water_samples$sample_id]) == 0 i get the following error "Error in rowSums(obj$data$tax_data[, water_samples$sample_id]) :
'x' must be numeric". I have tried numerous ways to correct this issue but to no avail. Can you tell me what the problem might be?
any help please?
Hadsell_script_for_running_Metacoder_w_RTL_genomics_16s.txt
Caddo_metadata_DH_102423.txt
sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt

@zachary-foster
Copy link
Contributor

Hello,

I ran your code with your example data and did not get that error. Does it only happen with the full dataset?

library(metacoder)

water_otu <- read.delim("sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt", header =T, sep ='\t', check.names = FALSE)
water_samples <- read.delim("Caddo_metadata_DH_102423.txt", header = T, sep = '\t')

obj <- parse_tax_data(water_otu,
                      class_cols = "lineage", # the column that contains taxonomic information
                      class_sep = ";", # The character used to separate taxa in the classification
                      class_regex = "^(.+)__(.+)$", # Regex identifying where the data for each taxon is
                      class_key = c(tax_rank = "info", # A key describing each regex capture group
                                    tax_name = "taxon_name"))

obj$data$tax_data <- zero_low_counts(obj, data = "tax_data", min_count = 5)

test <- obj$data$tax_data[ , water_samples$sample_id] == 0
no_reads <- rowSums(obj$data$tax_data[, water_samples$sample_id]) == 0

@dhadsell
Copy link
Author

dhadsell commented Oct 25, 2023 via email

@dhadsell
Copy link
Author

I noticed that the dataframes in your example are tibbles. I also just tried converting my two dataframes to tibbles but that does not seem to be the answer.

here is the correct metadata file

Caddo_meta_102423.txt

@zachary-foster
Copy link
Contributor

Thanks for the new file! I can reproduce the error now.

The issue is that your sample IDs are being read in as integers instead of strings since they are all numeric. This is one reason ID that start with a non-number character are generally better. At this line:

no_reads <- rowSums(obj$data$tax_data[, water_samples$sample_id]) == 0

water_samples$sample_id is being treated as column indexes instead of column names, so before that happens you need to add water_samples$sample_id <- as.character(water_samples$sample_id)

Here is the updated script that circumvents this error:

library(metacoder)
#> This is metacoder version 0.3.6 (stable)

water_otu <- read.delim("sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt", header =T, sep ='\t', check.names = FALSE)
#> Warning in file(file, "rt"): cannot open file
#> 'sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt': No such
#> file or directory
#> Error in file(file, "rt"): cannot open the connection
water_samples <- read.delim("Caddo_meta_102423.txt", header = T, sep = '\t')
#> Warning in file(file, "rt"): cannot open file 'Caddo_meta_102423.txt': No such
#> file or directory
#> Error in file(file, "rt"): cannot open the connection
water_samples$sample_id <- as.character(water_samples$sample_id)
#> Error in eval(expr, envir, enclos): object 'water_samples' not found

obj <- parse_tax_data(water_otu,
                      class_cols = "lineage", # the column that contains taxonomic information
                      class_sep = ";", # The character used to separate taxa in the classification
                      class_regex = "^(.+)__(.+)$", # Regex identifying where the data for each taxon is
                      class_key = c(tax_rank = "info", # A key describing each regex capture group
                                    tax_name = "taxon_name"))
#> Error in eval(expr, envir, enclos): object 'water_otu' not found

obj$data$tax_data <- zero_low_counts(obj, data = "tax_data", min_count = 5)
#> Error in eval(expr, envir, enclos): object 'obj' not found

test <- obj$data$tax_data[ , water_samples$sample_id] == 0
#> Error in eval(expr, envir, enclos): object 'obj' not found
no_reads <- rowSums(obj$data$tax_data[, water_samples$sample_id]) == 0
#> Error in eval(expr, envir, enclos): object 'obj' not found
no_reads
#> Error in eval(expr, envir, enclos): object 'no_reads' not found

Created on 2023-11-02 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants