Possible Incorrect rownames in TableS6.A.R #3

weshorton · 2018-09-24T23:13:42Z

I'm trying to recreate this table and I think there is an error when intersecting the input matrix with the reference database.

Here is the section in question (lines 87-109):

# Read in matrix and Sample annotation
mat <- read.delim("data/CountData.BMS038.txt")
SampleTableCorrected <- read.csv("data/SampleTableCorrected.9.19.16.csv", row.names=1)

# Only samples with response
SampleTableCorrected <- SampleTableCorrected[!(is.na(SampleTableCorrected$Response)), ]

# Find the overlapping samples
inter <- intersect(colnames(mat),rownames(SampleTableCorrected))
SampleTableCorrected <- SampleTableCorrected[inter,]
mat <- mat[,match(rownames(SampleTableCorrected),colnames(mat))]

# Create the DESeq2 object
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
ebg <- exonsBy(txdb, by="gene")
intersection <- intersect(rownames(mat),ebg@partitioning@NAMES)
ebg2 <- ebg[ebg@partitioning@NAMES %in% intersection]

# sort by ID
ebg2 <- ebg2[order(names(ebg2))]

# Sort by gene model order
mat <- mat[match(names(ebg2), rownames(mat)),]

mat is initially read in as a data.frame with 22,333 rows (genes) and 120 columns (samples).

mat gets intersected with the samples in SampleTableCorrected to remove 15 samples, resulting in 22,333 rows and 103 columns (both the "ID" column and the "HUGO" column are also removed here).

ebg is created from txdb, which contains 23,459 elements.

The entrez IDs of ebg (ebg@partitioning@NAMES) are intersected with the rownames of mat, but the rownames of mat are just 1:nrow(mat):

intersection1 <- intersect(rownames(mat), ebg@partitioning@NAMES)
intersection2 <- intersect(as.character(1:nrow(mat)), ebg@partitioning@NAMES)
length(which(intersection1 != intersection2))
0

This intersection results in a mat with 7,866 rows and 103 columns. If the rownames of mat are instead changed to the original entrez IDs, we get a larger intersection:

# Read in matrix and Sample annotation
mat <- read.delim("data/CountData.BMS038.txt")
ids <- mat$ID
SampleTableCorrected <- read.csv("data/SampleTableCorrected.9.19.16.csv", row.names=1)

# Only samples with response
SampleTableCorrected <- SampleTableCorrected[!(is.na(SampleTableCorrected$Response)), ]

# Find the overlapping samples
inter <- intersect(colnames(mat),rownames(SampleTableCorrected))
SampleTableCorrected <- SampleTableCorrected[inter,]
mat <- mat[,match(rownames(SampleTableCorrected),colnames(mat))]
rownames(mat) <- ids

# Create the DESeq2 object
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
ebg <- exonsBy(txdb, by="gene")
intersection <- intersect(rownames(mat),ebg@partitioning@NAMES)
ebg2 <- ebg[ebg@partitioning@NAMES %in% intersection]

# sort by ID
ebg2 <- ebg2[order(names(ebg2))]

# Sort by gene model order
mat <- mat[match(names(ebg2), rownames(mat)),]

This results in a matrix with 22,333 genes and 103 samples. Although neither method above created the same results as were found in TableS6.A downloaded from the manuscript page.

In order to get an output table, I had to change: Response.ihw.PRCR <- ResFiltered(results(dds.pre,name="ResponsePRCR",filterFun = ihw)) to Response.ihw.PRCR <- ResFiltered(results(dds.pre,name="Response_PRCR_vs_PD",filterFun = ihw)), which may be the result of a different version of DESeq2 (I'm using 1.20.0).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible Incorrect rownames in TableS6.A.R #3

Possible Incorrect rownames in TableS6.A.R #3

weshorton commented Sep 24, 2018

Possible Incorrect rownames in TableS6.A.R #3

Possible Incorrect rownames in TableS6.A.R #3

Comments

weshorton commented Sep 24, 2018