Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

converting Mutations components to maftools-friendly form? #18

Open
vjcitn opened this issue Jun 21, 2018 · 4 comments
Open

converting Mutations components to maftools-friendly form? #18

vjcitn opened this issue Jun 21, 2018 · 4 comments

Comments

@vjcitn
Copy link

vjcitn commented Jun 21, 2018

I am finding it challenging to convert the RaggedExperiment to a more MAF-like
tabular form. Am I missing something? Maybe we should add a component with
MAF content, perhaps as a dense GRanges, named "MAF"? I think this would
be used more readily, and we already have code that converts MAF to RaggedExperiment,
which could be provided as a tool.

@lwaldron
Copy link
Member

Couple thoughts:

  1. A MAF-like form wouldn't be compatible with MultiAssayExperiment, so a coercion method to DataFrame probably would belong with the RaggedExperiment package.
  2. It is a pain to convert these TCGA RaggedExperiments to matrices or to RangedSummarizedExperiment with one row per gene, equivalently to the RNA-seq datasets. @vjcitn and @LiNk-NY would you try out the helper function in this gist, see if you find it useful? Currently it just converts the RaggedExperiments to genes x samples matrices, but I could easily have it convert to RangedSummarizedExperiment

https://gist.github.com/lwaldron/47fb0c0bece56f58b762192c24117231

@lwaldron
Copy link
Member

The gist now converts the RaggedExperiments to RangedSummarizedExperiments, instead of matrices.

@lwaldron
Copy link
Member

Back to my comment 1 - this coercion method could be useful for GRangesList as well as for RaggedExperiment, so it's not even just a RaggedExperiment question.

@lwaldron
Copy link
Member

@vjcitn and @LiNk-NY take a look at the conveniencefuns branch I just pushed. It's far from perfect but does the following:

> accmae <- curatedTCGAData("ACC", c("CNASNP", "Mutation", "miRNASeqGene", "GISTICT"), dry.run = FALSE)
> accmae
A MultiAssayExperiment object of 4 listed
 experiments with user-defined names and respective classes. 
 Containing an ExperimentList class object of length 4: 
 [1] ACC_CNASNP-20160128: RaggedExperiment with 79861 rows and 180 columns 
 [2] ACC_GISTIC_ThresholdedByGene-20160128: SummarizedExperiment with 24776 rows and 90 columns 
 [3] ACC_miRNASeqGene-20160128: SummarizedExperiment with 1046 rows and 80 columns 
 [4] ACC_Mutation-20160128: RaggedExperiment with 20166 rows and 90 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices
> simplemae <- simplifyTCGA(accmae)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
> simplemae
A MultiAssayExperiment object of 6 listed
 experiments with user-defined names and respective classes. 
 Containing an ExperimentList class object of length 6: 
 [1] ACC_Mutation-20160128_simplified: RangedSummarizedExperiment with 22945 rows and 90 columns 
 [2] ACC_CNASNP-20160128_simplified: RangedSummarizedExperiment with 22945 rows and 180 columns 
 [3] ACC_miRNASeqGene-20160128_ranged: RangedSummarizedExperiment with 1002 rows and 80 columns 
 [4] ACC_miRNASeqGene-20160128_unranged: SummarizedExperiment with 44 rows and 80 columns 
 [5] ACC_GISTIC_ThresholdedByGene-20160128_ranged: RangedSummarizedExperiment with 19601 rows and 90 columns 
 [6] ACC_GISTIC_ThresholdedByGene-20160128_unranged: SummarizedExperiment with 5175 rows and 90 columns 
Features: 
 experiments() - obtain the ExperimentList instance 
 colData() - the primary/phenotype DataFrame 
 sampleMap() - the sample availability DataFrame 
 `$`, `[`, `[[` - extract colData columns, subset, or experiment 
 *Format() - convert into a long or wide DataFrame 
 assays() - convert ExperimentList to a SimpleList of matrices
> rownames(simplemae)
CharacterList of length 6
[["ACC_Mutation-20160128_simplified"]] A1BG NAT2 ADA CDH2 AKT3 ... KCNE2 DGCR2 CASP8AP2 SCO2
[["ACC_CNASNP-20160128_simplified"]] A1BG NAT2 ADA CDH2 AKT3 ... KCNE2 DGCR2 CASP8AP2 SCO2
[["ACC_miRNASeqGene-20160128_ranged"]] hsa-let-7a-1 hsa-let-7a-2 ... hsa-mir-99a hsa-mir-99b
[["ACC_miRNASeqGene-20160128_unranged"]] hsa-mir-103-1 hsa-mir-103-1-as ... hsa-mir-941-4
[["ACC_GISTIC_ThresholdedByGene-20160128_ranged"]] ACAP3 ACTRT2 AGRN ... SNORA56 TMLHE VBP1
[["ACC_GISTIC_ThresholdedByGene-20160128_unranged"]] C1orf170 ... WASIR1|ENSG00000185203.7
> 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants