Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Incorrect rownames in TableS6.A.R #3

Open
weshorton opened this issue Sep 24, 2018 · 0 comments
Open

Possible Incorrect rownames in TableS6.A.R #3

weshorton opened this issue Sep 24, 2018 · 0 comments

Comments

@weshorton
Copy link

I'm trying to recreate this table and I think there is an error when intersecting the input matrix with the reference database.

Here is the section in question (lines 87-109):

# Read in matrix and Sample annotation
mat <- read.delim("data/CountData.BMS038.txt")
SampleTableCorrected <- read.csv("data/SampleTableCorrected.9.19.16.csv", row.names=1)

# Only samples with response
SampleTableCorrected <- SampleTableCorrected[!(is.na(SampleTableCorrected$Response)), ]

# Find the overlapping samples
inter <- intersect(colnames(mat),rownames(SampleTableCorrected))
SampleTableCorrected <- SampleTableCorrected[inter,]
mat <- mat[,match(rownames(SampleTableCorrected),colnames(mat))]

# Create the DESeq2 object
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
ebg <- exonsBy(txdb, by="gene")
intersection <- intersect(rownames(mat),ebg@partitioning@NAMES)
ebg2 <- ebg[ebg@partitioning@NAMES %in% intersection]

# sort by ID
ebg2 <- ebg2[order(names(ebg2))]

# Sort by gene model order
mat <- mat[match(names(ebg2), rownames(mat)),]

mat is initially read in as a data.frame with 22,333 rows (genes) and 120 columns (samples).

mat gets intersected with the samples in SampleTableCorrected to remove 15 samples, resulting in 22,333 rows and 103 columns (both the "ID" column and the "HUGO" column are also removed here).

ebg is created from txdb, which contains 23,459 elements.

The entrez IDs of ebg (ebg@partitioning@NAMES) are intersected with the rownames of mat, but the rownames of mat are just 1:nrow(mat):

intersection1 <- intersect(rownames(mat), ebg@partitioning@NAMES)
intersection2 <- intersect(as.character(1:nrow(mat)), ebg@partitioning@NAMES)
length(which(intersection1 != intersection2))
0

This intersection results in a mat with 7,866 rows and 103 columns. If the rownames of mat are instead changed to the original entrez IDs, we get a larger intersection:

# Read in matrix and Sample annotation
mat <- read.delim("data/CountData.BMS038.txt")
ids <- mat$ID
SampleTableCorrected <- read.csv("data/SampleTableCorrected.9.19.16.csv", row.names=1)

# Only samples with response
SampleTableCorrected <- SampleTableCorrected[!(is.na(SampleTableCorrected$Response)), ]

# Find the overlapping samples
inter <- intersect(colnames(mat),rownames(SampleTableCorrected))
SampleTableCorrected <- SampleTableCorrected[inter,]
mat <- mat[,match(rownames(SampleTableCorrected),colnames(mat))]
rownames(mat) <- ids

# Create the DESeq2 object
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
ebg <- exonsBy(txdb, by="gene")
intersection <- intersect(rownames(mat),ebg@partitioning@NAMES)
ebg2 <- ebg[ebg@partitioning@NAMES %in% intersection]

# sort by ID
ebg2 <- ebg2[order(names(ebg2))]

# Sort by gene model order
mat <- mat[match(names(ebg2), rownames(mat)),]

This results in a matrix with 22,333 genes and 103 samples. Although neither method above created the same results as were found in TableS6.A downloaded from the manuscript page.

In order to get an output table, I had to change: Response.ihw.PRCR <- ResFiltered(results(dds.pre,name="ResponsePRCR",filterFun = ihw)) to Response.ihw.PRCR <- ResFiltered(results(dds.pre,name="Response_PRCR_vs_PD",filterFun = ihw)), which may be the result of a different version of DESeq2 (I'm using 1.20.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant