Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified methylation data #34

Open
pcheng84 opened this issue Feb 24, 2020 · 5 comments
Open

Simplified methylation data #34

pcheng84 opened this issue Feb 24, 2020 · 5 comments
Assignees

Comments

@pcheng84
Copy link

Hi Levi and Marcel,

I was wondering if it would be possible to implement a simplified methylation data set. As you know the raw methylation data is quite unwieldy so I condensed it to a matrix, where each row is the median B value for the CpG island. I took the annotation from the IlluminaHumanMethylation450kanno.ilmn12.hg19 library.

here is the code I used to make the simplified methylation values.

library(IlluminaHumanMethylation450kanno.ilmn12.hg19)
library(curatedTCGAData)
library(data.table)
lusc <- curatedTCGAData("LUSC", "Methylation_methyl450", FALSE)

#Get Illumina methylation island data
illumina <- getAnnotation(IlluminaHumanMethylation450kanno.ilmn12.hg19)
illumina <- data.table("REF" = rownames(illumina),
                     "Chr" = illumina@listData$chr,
                     "Loc" = illumina@listData$pos,
                     "Island" = illumina@listData$Relation_to_Island,
                     "Island_Name" = illumina@listData$Islands_Name,
                     "Gene" = illumina@listData$UCSC_RefGene_Name,
                     "UCSC_RefGene_Group" = illumina@listData$UCSC_RefGene_Group
  )
setkey(illumina, "Island")
illumina <- illumina["Island"]

#Extract methylation data for all genes, then
#limit to loci with Island annotation from Illumina
meth_assay_num <- grep("Methylation", names(mae))
meth <- mae[[meth_assay_num]]
meth <- meth[illumina$REF,]

#Merge methylation data with annotation data
merged <- as.data.table(assay(meth))
merged[, "REF" := illumina$REF]
merged <- merge(merged, illumina, by = "REF")

#Calculate the median methylation value per island
methylisland <- merged[, lapply(.SD, function(x) median(x, na.rm=T)),
                         .SDcols = !c("REF", "Chr", "Loc", "Island", "Gene", "UCSC_RefGene_Group"), 
                         by = Island_Name]
methylisland <- na.omit(methylisland)

methylisland2 <- as.matrix(methylisland[, setdiff(colnames(methylisland), "Island_Name"), with = FALSE])
rownames(methylisland2) <- methylisland$Island_Name
#Append expression level matrix to original MAE object
mae2 <- c(mae, SimpleMethyl = methylisland2, mapFrom = meth_assay_num)

Cheers,
Phil

@lwaldron
Copy link
Member

Thanks for the code, @pcheng84! Yes it would be convenient to have a simplified methylation version pre-computed. @LiNk-NY, it looks like the c("REF", "Chr", "Loc", "Island", "Gene", "UCSC_RefGene_Group") (and possibly other?) columns would be convenient to have as rowData and rowRanges in a RangedSummarizedExperiment.

@pcheng84
Copy link
Author

pcheng84 commented Mar 3, 2020

Those are the columns I use for the annotation of the islands. I think there are more columns in the listData tables of the IlluminaHumanMethylation450kanno.ilmn12.hg19 annotation object you could add to rowData.

I made a small mistake in my code

lusc <- curatedTCGAData("LUSC", "Methylation_methyl450", FALSE)

should be

mae <- curatedTCGAData("LUSC", "Methylation_methyl450", FALSE)

@LiNk-NY
Copy link
Contributor

LiNk-NY commented Mar 10, 2020

Hi Phil, @pcheng84
Thank you for pointing this out and providing code to work with.
I think this would be a good addition as an add-on type of package that can optionally replace methylation datasets from curatedTCGAData.
Currently, it would take me quite some time to re-run the pipeline and integrate the datasets.
It would be easier to have a separate ExperimentHub package. Let me know what you think.
Thanks!

Best,
Marcel

@pcheng84
Copy link
Author

Hi Marcel, @LiNk-NY

An ExperimentHub package would work nicely. How should we proceed with this?

Cheers,
Phil

@LiNk-NY
Copy link
Contributor

LiNk-NY commented Mar 12, 2020

Hi Phil, @pcheng84
I was thinking we could test-drive functionality that Kayla @Kayla-Morrell
has been working on to make package creation and resource upload easier.

There is a branch called constructHubFunctions in AnnotationHubData
that lets you run hub_create_package(). I will look into it as well in the coming days.

https://github.com/Bioconductor/AnnotationHubData/tree/constructHubFunctions

Best,
Marcel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants