Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional Labels for Immgen Data #68

Open
robertamezquita opened this issue Dec 2, 2019 · 9 comments
Open

Additional Labels for Immgen Data #68

robertamezquita opened this issue Dec 2, 2019 · 9 comments

Comments

@robertamezquita
Copy link

Proposing an additional layer of annotations for the T cell population. Splitting on CD4 vs CD8 would be first and foremost (looks like its already mostly doable by grepping on T.4 or T.8 from the fine labels). Adding the different subsets would be next, again, mostly can get this from the name already.

In case this is useful, leaving this here for consideration. Obviously way too many different ways to cut this data to really have it all in a single object and make everybody happy.

To add this to the current immgen dataset, below be some code. Please excuse the tidyverse coding.

library(SingleR)
library(tidyverse)

immgen <- ImmGenData()

manual <- tribble(
    ~label.manual, ~label.fine,
    "CD4 Naive", c("T cells (T.4NVE)", "T cells (T.4NVE44-49D-11A-)", "T cells (T.4Nve)"),
    "CD4 Effector", "T cells (T.4EFF49D+11A+.D8.LCMV)",
    "CD4 Memory", c("T cells (T.4MEM)", "T cells (T.4MEM44H62L)",
                    "T cells (T.4MEM49D+11A+.D30.LCMV)", "T cells (T.4Mem)"),
    "CD8 Naive", c("T cells (T.8NVE)", "T cells (T.8NVE.OT1)", "T cells (T.8Nve)"),
    "CD8 Effector", c("T cells (T.8EFF.OT1.D10.LISOVA)", "T cells (T.8EFF.OT1.D10LIS)",
                      "T cells (T.8EFF.OT1.D8.LISOVA)", "T cells (T.8EFF.OT1.D8.VSVOVA)",
                      "T cells (T.8EFF.OT1.D8LISO)"),
    "CD8 Memory", c("T cells (T.8MEM)", "T cells (T.8MEM.OT1.D100.LISOVA)",
                    "T cells (T.8MEM.OT1.D106.VSVOVA)", "T cells (T.8MEM.OT1.D45.LISOVA)",
                    "T cells (T.8Mem)"),
    "Treg", "T cells (T.Tregs)"
)

manual.vec <- manual$label.manual
names(manual.vec) <- manual$label.fine


## Filter based on manual new annotation
immgen.tc <- immgen[, immgen$label.fine %in% manual$label.fine]

## Append new label
immgen.tc$label.manual <- manual.vec[immgen.tc$label.fine]
@anoronh4
Copy link

anoronh4 commented Feb 9, 2020

i actually think better curation of the entire Immgen data set may be helpful. For example, there exists:

Epithelial cells | Epithelial cells (Ep.5wk.MEC.Sca1+)
Epithelial cells | Epithelial cells (Ep.5wk.MEChi)
Epithelial cells | Epithelial cells (Ep.5wk.MEClo)
Epithelial cells | Epithelial cells (Ep.8wk.CEC.Sca1+)
Epithelial cells | Epithelial cells (Ep.8wk.CEChi)
Epithelial cells | Epithelial cells (Ep.8wk.MEChi)
Epithelial cells | Epithelial cells (Ep.8wk.MEClo)

in the fine label category of ImmGenData. I don't think most people using this package have much use for time points (8 wk vs 5wk), and information such as "Ep.8wk.MEChi" is too obscure for me to figure out what it is and relate it to my own dataset. An intermediate data layer or better curation of one level would be extremely helpful.

That being said, this issue is most apparent to me for ImmGenData. MonacoImmuneData, for example, has much more helpful fine categories (but is not mouse, so it doesn't help me).

@LTLA
Copy link
Owner

LTLA commented Feb 10, 2020

@anoronh4 Funny you say that, because - thanks to the efforts of @j-andrews7 - the latest version of SingleR has Cell Ontology mappings returned for all labels in ImmGenData(). This can be used to adjust the labels to any desired resolution by traversing the ontology tree - in principle, at least. Perhaps @vjcitn may have some comments/code on how one might do so in practice via ontoProc.

@vjcitn
Copy link

vjcitn commented Feb 10, 2020

@anoronh4 -- the colData()$label.ont has Cell Ontology mappings. You can check slack discussion around https://community-bioc.slack.com/archives/CE8AB163W/p1580737521140000 to see some relevant concepts. I don't see uptake of the subset_descendants and common_classes methods discussed there so have not pursued it further; I need to update the ontoProc vignette to deal with the label.ont fields but AFAIK there is no commitment to use that name or define methods to retrieve ontology tags for samples.

@LTLA
Copy link
Owner

LTLA commented Feb 11, 2020

@vjcitn:

  • I will add the common_classes example to the SingleR vignette.
  • I don't recall us discussing subset_descendents?
  • I wonder what would be an easy interface for users to tune the desired granularity of these terms.

@LTLA
Copy link
Owner

LTLA commented Feb 12, 2020

Right. Having poked around, I think onto_plot2 may be close to what we need to close this issue.

To restate the problem; the user has a bunch of terms near the tips of the ontology DAG. They want to scale back the granularity of these terms to something that is broader. I propose the following workflow:

  1. User uses onto_plot2() to visualize the relationships between the available terms. This does, however, require some pruning of the current visualization; there are far too many terms and the plot is very crowded (try using it on the ImmGenData terms). I would like an option to limit the graph to the observed terms, the MRCA of those terms and the MRCA of the MRCAs.
  2. User chooses some MRCAs that represents their desired granularity.
  3. User supplies these MRCAs to another function that remaps each descendant term in label.ont to its MCRA. No remapping is done if the MRCA is not listed, in which case the existing labels are assumed to be satisfactory. Some care is required to handle cases where a term is a descendent of two MRCAs - I guess the whole concept of a MRCA doesn't really work here.

@vjcitn
Copy link

vjcitn commented Feb 12, 2020

I see -- by the way, I didn't know that MRCA = most recent common ancestor. These are the lines from onto_plot2 that will help to carry this out:

    pl = ontologyPlot::onto_plot(ont, terms2use, ...)
    gnel = make_graphNEL_from_ontology_plot(pl) # defined in ontoProc

Once we have the gnel (terms2use here should be inclusive) we can make subgraphs as you wish. If interactive visualization is important we might need to move beyond Rgraphviz but I am not clear on the most appropriate option.

@LTLA
Copy link
Owner

LTLA commented Feb 13, 2020

Having tried this, I don't think it's reasonable to expect people to poke through the plot:

library(ontoProc)
library(ontologyPlot)
library(SingleR)

cl <- getCellOnto()
imm <- ImmGenData()
pl <- ontologyPlot::onto_plot(cl, imm$label.ont)

The graph is too large, the words are too small and you can't easily copy and paste the terms. I think the plot would be all right to look at for an overview but not as the frontline tool for the details.

After some more thought, one possible option is to have a function that takes a set of terms and then simply prints out a data.frame of all internal nodes that are MRCAs (with some plain-english annotation in the other columns, plus some statistics about how many children are present). The user can then easily examine the internal nodes that provide a biological resolution they are happy with; after this choice is made, it is then straightforward to have a function to roll back terms to their parents.

@namit-k
Copy link

namit-k commented Mar 13, 2020

Along the line of comments here, working with ImmGen has been troubling due to its naming convention like "Ep.8wk.MEChi", which is adding unnecessary details (time points 8 wk) to the base "Epithelial cells" annotation. I have a code that cleaned up the entire ImmGen labels to meaningful and easy to comprehend annotations. I am happy to share code or create a pull request, if interested?

@marencc
Copy link

marencc commented Nov 30, 2023

@namit-k Hi! I am using the InmGen labels, could you please provide the code to clean up the labels to extract a meaningful annotation? Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants