Getting incorrectly formated results with zero_low_counts #356

dhadsell · 2023-10-24T21:19:21Z

Hello,
I am trying to run Metacoder on some 16s data from water samples that was obtained through RTL genomics. I import 2 text files. The first contains the otu with 30 samples x 6,672 otu. the second is the meta data file with sample ids in the first column and trt groups in the second. I have attached samples of both files along with the script that I used. It seems that when I first use the "parse_tax_data" comand the output appears correctly formated. But when I attempt to run the object through the "zero_low_counts" command the resulting tax_data loses the otu_id and lineage columns and the remaining sample columns are converted from to . It turns out that the obj$data$tax_data[ , water_samples$sample_id] == 0 on the full data set it generates a large matrix whereas in your example analysis with human data the object is a "logi" object. If I take only the first 100 lines of the otu data then a logi object is generated but the below error still happens
when I try to run the rowSums comand on obj$data$tax_data[ , water_samples$sample_id]) == 0 i get the following error "Error in rowSums(obj$data$tax_data[, water_samples$sample_id]) :
'x' must be numeric". I have tried numerous ways to correct this issue but to no avail. Can you tell me what the problem might be?
any help please?
Hadsell_script_for_running_Metacoder_w_RTL_genomics_16s.txt
Caddo_metadata_DH_102423.txt
sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt

zachary-foster · 2023-10-25T19:59:15Z

Hello,

I ran your code with your example data and did not get that error. Does it only happen with the full dataset?

library(metacoder)

water_otu <- read.delim("sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt", header =T, sep ='\t', check.names = FALSE)
water_samples <- read.delim("Caddo_metadata_DH_102423.txt", header = T, sep = '\t')

obj <- parse_tax_data(water_otu,
                      class_cols = "lineage", # the column that contains taxonomic information
                      class_sep = ";", # The character used to separate taxa in the classification
                      class_regex = "^(.+)__(.+)$", # Regex identifying where the data for each taxon is
                      class_key = c(tax_rank = "info", # A key describing each regex capture group
                                    tax_name = "taxon_name"))

obj$data$tax_data <- zero_low_counts(obj, data = "tax_data", min_count = 5)

test <- obj$data$tax_data[ , water_samples$sample_id] == 0
no_reads <- rowSums(obj$data$tax_data[, water_samples$sample_id]) == 0

dhadsell · 2023-10-25T21:25:34Z

Hello Zachary, So it turns out that in the I attached the wrong version of the meta-data file. The correct file is now attached This file is a dataframe with two variables and 30 observations The first variable is “sample_id” and the second is “group” The original data file had 10 variables and did not contain a “sample_id” column With the correct metadata file I get the error regardless of whether I use the small sample otu dataset or the full otu dataset. From: Zachary Foster ***@***.***> Sent: Wednesday, October 25, 2023 2:59 PM To: grunwaldlab/metacoder ***@***.***> Cc: Hadsell, Darryl L. ***@***.***>; Author ***@***.***> Subject: Re: [grunwaldlab/metacoder] Getting incorrectly formated results with zero_low_counts (Issue #356) ***CAUTION:*** This email is not from a BCM Source. Only click links or open attachments you know are safe.

…

________________________________ Hello, I ran your code with your example data and did not get that error. Does it only happen with the full dataset? library(metacoder) water_otu <- read.delim("sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt", header =T, sep ='\t', check.names = FALSE) water_samples <- read.delim("Caddo_metadata_DH_102423.txt", header = T, sep = '\t') obj <- parse_tax_data(water_otu, class_cols = "lineage", # the column that contains taxonomic information class_sep = ";", # The character used to separate taxa in the classification class_regex = "^(.+)__(.+)$", # Regex identifying where the data for each taxon is class_key = c(tax_rank = "info", # A key describing each regex capture group tax_name = "taxon_name")) obj$data$tax_data <- zero_low_counts(obj, data = "tax_data", min_count = 5) test <- obj$data$tax_data[ , water_samples$sample_id] == 0 no_reads <- rowSums(obj$data$tax_data[, water_samples$sample_id]) == 0 — Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_grunwaldlab_metacoder_issues_356-23issuecomment-2D1779963587&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=XTCyyeX5o3ZkNVzfALlNF9OFiwkE3rMWvWV85Km9iXs&m=ncNEDOJRZpCTjz6UowYa5AxcxtUOY7O1qUwc753ep1-jHHD5aRp35TCqm0JidMI8&s=N9Z3-ULOFlzAdXMufkV4vmq6etXMdflVsHB1-lbxqD8&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AK2FDPQ37D52YZVXXTRHFDTYBFVR3AVCNFSM6AAAAAA6OKQODSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZZHE3DGNJYG4&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=XTCyyeX5o3ZkNVzfALlNF9OFiwkE3rMWvWV85Km9iXs&m=ncNEDOJRZpCTjz6UowYa5AxcxtUOY7O1qUwc753ep1-jHHD5aRp35TCqm0JidMI8&s=c0Qs2dCZzvFkHPqSrB8nu432s2xYzbVgrb3UkXN6RhY&e=>. You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>> sample_id group 1 Deep_Open 10 Deep_Matted 11 Shallow_M 12 Deep_Matted 13 Shallow_M 14 Deep_Open 15 Deep_Mattted 16 Deep_Mattted 17 Shallow_M 18 Shallow_Open 19 Shallow_Matted 2 Deep_Matted 20 Deep_Matted 21 Deep_Open 22 Deep_Matted 23 Deep_Open 24 Deep_Open 25 Deep_Matted 26 Deep_Open 27 Shallow_Matted 28 Deep_Open 29 Deep_Matted 3 Deep_Open 30 Deep_Open 4 Shallow_Matted 5 Deep_Matted 6 Shallow_Open 7 Shallow_Open 8 Deep_Open 9 Shallow_Light

dhadsell · 2023-10-27T17:56:15Z

I noticed that the dataframes in your example are tibbles. I also just tried converting my two dataframes to tibbles but that does not seem to be the answer.

here is the correct metadata file

Caddo_meta_102423.txt

zachary-foster · 2023-11-02T18:41:17Z

Thanks for the new file! I can reproduce the error now.

The issue is that your sample IDs are being read in as integers instead of strings since they are all numeric. This is one reason ID that start with a non-number character are generally better. At this line:

no_reads <- rowSums(obj$data$tax_data[, water_samples$sample_id]) == 0

water_samples$sample_id is being treated as column indexes instead of column names, so before that happens you need to add water_samples$sample_id <- as.character(water_samples$sample_id)

Here is the updated script that circumvents this error:

library(metacoder)
#> This is metacoder version 0.3.6 (stable)

water_otu <- read.delim("sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt", header =T, sep ='\t', check.names = FALSE)
#> Warning in file(file, "rt"): cannot open file
#> 'sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt': No such
#> file or directory
#> Error in file(file, "rt"): cannot open the connection
water_samples <- read.delim("Caddo_meta_102423.txt", header = T, sep = '\t')
#> Warning in file(file, "rt"): cannot open file 'Caddo_meta_102423.txt': No such
#> file or directory
#> Error in file(file, "rt"): cannot open the connection
water_samples$sample_id <- as.character(water_samples$sample_id)
#> Error in eval(expr, envir, enclos): object 'water_samples' not found

obj <- parse_tax_data(water_otu,
                      class_cols = "lineage", # the column that contains taxonomic information
                      class_sep = ";", # The character used to separate taxa in the classification
                      class_regex = "^(.+)__(.+)$", # Regex identifying where the data for each taxon is
                      class_key = c(tax_rank = "info", # A key describing each regex capture group
                                    tax_name = "taxon_name"))
#> Error in eval(expr, envir, enclos): object 'water_otu' not found

obj$data$tax_data <- zero_low_counts(obj, data = "tax_data", min_count = 5)
#> Error in eval(expr, envir, enclos): object 'obj' not found

test <- obj$data$tax_data[ , water_samples$sample_id] == 0
#> Error in eval(expr, envir, enclos): object 'obj' not found
no_reads <- rowSums(obj$data$tax_data[, water_samples$sample_id]) == 0
#> Error in eval(expr, envir, enclos): object 'obj' not found
no_reads
#> Error in eval(expr, envir, enclos): object 'no_reads' not found

^{Created on 2023-11-02 with reprex v2.0.2}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting incorrectly formated results with zero_low_counts #356

Getting incorrectly formated results with zero_low_counts #356

dhadsell commented Oct 24, 2023

zachary-foster commented Oct 25, 2023

dhadsell commented Oct 25, 2023 via email

dhadsell commented Oct 27, 2023

zachary-foster commented Nov 2, 2023

Getting incorrectly formated results with zero_low_counts #356

Getting incorrectly formated results with zero_low_counts #356

Comments

dhadsell commented Oct 24, 2023

zachary-foster commented Oct 25, 2023

dhadsell commented Oct 25, 2023 via email

dhadsell commented Oct 27, 2023

zachary-foster commented Nov 2, 2023