Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert Seurat to SFE and back #3

Open
alikhuseynov opened this issue Jul 25, 2023 · 15 comments
Open

convert Seurat to SFE and back #3

alikhuseynov opened this issue Jul 25, 2023 · 15 comments
Assignees
Labels
advanced documentation Improvements or additions to documentation enhancement New feature or request

Comments

@alikhuseynov
Copy link
Collaborator

Hi there )
as we discussed a bit on voyager #2 to add support to convert from Seurat to SFE and back. Is that something you are currently doing? If yes, and help is needed, I could jump in.
Else, I could try to implement that conversion, but could do PR only around September, if that works, let me know.
Thanks
A.

@lambdamoses
Copy link
Collaborator

I'm not currently doing it. Yes, help is needed. Thank you so much for the PR!

@alikhuseynov
Copy link
Collaborator Author

I'm not currently doing it. Yes, help is needed. Thank you so much for the PR!

sounds good! happy to contribute.
btw, the reader for Xenium data (similar to readVizgen) is also going to be in the next release?

@lambdamoses
Copy link
Collaborator

Yes, readXenium is really easy to implement.

@alikhuseynov
Copy link
Collaborator Author

alikhuseynov commented Aug 21, 2023

Hey Lambda,
As far as I remember the transcript/molecule coords were not stored in the object.
Are you storing the molecule coordinates (ie detected_transcripts.csv) somewhere in SFE or is it something in-progress?
thanks

@lambdamoses
Copy link
Collaborator

You can store it in rowData, but I'm not sure what to do with the huge size of these files at present. As a result, readVizgen doesn't read it by default.

@alikhuseynov
Copy link
Collaborator Author

ok, for Seurat I made it optional -> to have molecules or not
I just want to be consistent when users want to convert from Seurat obj (with molecules) to SFE.
I'm making few changes in readVizgen() to handle the most recent output from Vizgen, as well as older data (eg, when they had extra subdirectory for Cellpose), molecules can be added optionally as well, will try to PR soon for that.

@lambdamoses
Copy link
Collaborator

Thank you so much! I need to look into on disk representations of geometry to better deal with the transcript spots. Now there's SpatialData in development, mainly raster, and I'm considering sedona for on disk vector geometry operations. I don't know whether raster or vector is more efficient on disk. Anyway, for now the large files read into memory can be dealt with by running the code on a server. BTW, I added you to the Acknowledgement section of the Voyager paper.

@alikhuseynov
Copy link
Collaborator Author

Thank you so much! I need to look into on disk representations of geometry to better deal with the transcript spots. Now there's SpatialData in development, mainly raster, and I'm considering sedona for on disk vector geometry operations. I don't know whether raster or vector is more efficient on disk. Anyway, for now the large files read into memory can be dealt with by running the code on a server. BTW, I added you to the Acknowledgement section of the Voyager paper.

Always happy to contribute, Many thanks for the Acknowledgement! :-)
We have contact with SpatialData developers (next building from us).
For visualization of molecules on the whole tissue coords, one could downsample them randomly and plot only that small fraction.
I'm not very familiar with on disk storage, but probably vector would be more efficient and simpler to store?

@alikhuseynov
Copy link
Collaborator Author

You can store it in rowData, but I'm not sure what to do with the huge size of these files at present. As a result, readVizgen doesn't read it by default.

Almost ready with the optimized readVizgen() version.
Storing molecule coords (aka detected_transcripts.csv) is going to be optional. Some processing is need to store them in rowData.
Here is how the filtered (ie, matched to cell ids of count matrix) molecule coords looks like:

mols %>% str
spc_tbl_ [2,109,623 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ x      : num [1:2109623] 531 439 439 439 532 ...
 $ y      : num [1:2109623] 5342 5391 5393 5393 5342 ...
 $ gene   : chr [1:2109623] "Npas1" "Myh11" "Myh11" "Myh11" ...
 $ cell_id: chr [1:2109623] "1771792403489100047" "1771792403489100025" "1771792403489100025" "1771792403489100025" ...

# df size
mols %>% 
    object.size() %>% 
    print(units = "Mb")
64.6 Mb

I'm splitting by gene and organizing it into sf data.frame object as MULTIPOINT, look like this:

mols.list %>% str
List of 1
 $ :Classes ‘sf’ and 'data.frame':	140 obs. of  2 variables:
  ..$ gene    : chr [1:140] "4732456N10Rik" "Ace2" "Adora2a" "Aldh1l1" ...
  ..$ geometry:sfc_MULTIPOINT of length 140; first list element:  'XY' num [1:303, 1:2] 128 128 128 128 128 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:303] "3277" "3402" "3564" "3671" ...
  .. .. ..$ : chr [1:2] "x" "y"
  ..- attr(*, "sf_column")= chr "geometry"
  ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA
  .. ..- attr(*, "names")= chr "gene"

The question is:

  • to store them in rowData
rowData(sfe) %>% str
Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  ..@ rownames       : chr [1:140] "4732456N10Rik" "Ace2" "Adora2a" "Aldh1l1" ...
  ..@ nrows          : int 140
  ..@ elementType    : chr "ANY"
  ..@ elementMetadata: NULL
  ..@ metadata       : list()
  ..@ listData       :List of 2
  .. ..$ gene    : chr [1:140] "4732456N10Rik" "Ace2" "Adora2a" "Aldh1l1" ...
  .. ..$ geometry:sfc_MULTIPOINT of length 140; first list element:  'XY' num [1:303, 1:2] 128 128 128 128 128 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:303] "3277" "3402" "3564" "3671" ...
  .. .. .. ..$ : chr [1:2] "x" "y"
  .. ..- attr(*, "sf_column")= chr "geometry"
  .. ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA
  .. .. ..- attr(*, "names")= chr "gene"
  • or in rowGeometries would be better?
rowGeometries(sfe, withDimnames = FALSE) <- mols.list
Warning message in .clean_internal_names(names(value), N = length(value), msg = "names(value)"):
“'names(value)' is NULL, replacing with 'unnamed'”
dimGeometryNames(sfe, MARGIN = 1) <- "molecules"

rowGeometries(sfe) %>% str
Formal class 'SimpleList' [package "S4Vectors"] with 4 slots
  ..@ listData       :List of 1
  .. ..$ molecules:Classes ‘sf’ and 'data.frame':	140 obs. of  2 variables:
  .. .. ..$ gene    : chr [1:140] "4732456N10Rik" "Ace2" "Adora2a" "Aldh1l1" ...
  .. .. ..$ geometry:sfc_MULTIPOINT of length 140; first list element:  'XY' num [1:303, 1:2] 128 128 128 128 128 ...
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:303] "3277" "3402" "3564" "3671" ...
  .. .. .. .. ..$ : chr [1:2] "x" "y"
  .. .. ..- attr(*, "sf_column")= chr "geometry"
  .. .. ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA
  .. .. .. ..- attr(*, "names")= chr "gene"
  ..@ elementType    : chr "ANY"
  ..@ elementMetadata: NULL
  ..@ metadata       : list()

# check if gene names correspond
identical(sfe %>% rownames(), 
          sfe %>% rowGeometry() %>% rownames)
sample_id is not applicable to rowGeometries.
TRUE

Let me know.
Thanks!

@lambdamoses
Copy link
Collaborator

Thank you so much! Yes, I actually meant rowGeometries.

@alikhuseynov
Copy link
Collaborator Author

Thank you so much! Yes, I actually meant rowGeometries.

👍 great!

@alikhuseynov
Copy link
Collaborator Author

alikhuseynov commented Sep 26, 2023

Almost completed on this converter function as_SeuratSFE().

  • need to add support when SFE has multiple sample_id, or Seurat has multiple FOVs.
  • cbind issue for large dense matrix. This happens when Seurat object has > 1 Assays, and typically SCT. That assay goes to as altExp of class SFE. Basically all works well for cbind, expect when one has altExp -> never finishes to combine 2 object. It works only if I first separately combine altExp, then combine SFE objects without altExp, then add combined altExp to the combined SFE object. Any alternative fast ways to combine SFE objects with altExp, may be similar to merge?

@lambdamoses
Copy link
Collaborator

Can you put your function in setAs so one can use as(seu, "SpatialFeatureExperiment"), or write an S4 method of toSpatialFeatureExperiment for Seurat, to be consistent with the coercion methods for SPE and SCE, as in the coerce.R file? Or is there a reason why this can't be done?

Don't worry about altExp for now. Maybe for now different assays can become different SFE objects when the number of features are different, or add the different feature metadata manually to rowData. There's MultiAssayExperiment for that.

@alikhuseynov
Copy link
Collaborator Author

Can you put your function in setAs so one can use as(seu, "SpatialFeatureExperiment"), or write an S4 method of toSpatialFeatureExperiment for Seurat, to be consistent with the coercion methods for SPE and SCE, as in the coerce.R file? Or is there a reason why this can't be done?

Right now, it just a single function that can convert from Seurat (v4) to SFE and back to Seurat (eg, when arg to_Seurat = TRUE). I'm wanted to add support if objects have multiple tissue sections (FOVs or sample_id).

I can definitely try with setAs, would be cool and PR this when ready.
So, it could be like as(seu, "SpatialFeatureExperiment") to convert from Seurat to SFE, and as(sfe, "Seurat") from SFE to Seurat?

Don't worry about altExp for now. Maybe for now different assays can become different SFE objects when the number of features are different, or add the different feature metadata manually to rowData. There's MultiAssayExperiment for that.

Yeah, no problem, I managed to cbind in the end, the sample_id needs to be set before (also for altExp). Currently, any other Seurat assays (if present) are added as altExp (ie, SFE obj with centroids only), DefaultAssay(seu) is stored in mainExpName. But any other efficient ways are is possible, will check MultiAssayExperiment
Thanks

@alikhuseynov alikhuseynov added documentation Improvements or additions to documentation enhancement New feature or request advanced labels Mar 7, 2024
@alikhuseynov
Copy link
Collaborator Author

alikhuseynov commented Mar 21, 2024

a small update, toSpatialFeatureExperiment would support Seurat to SFE coercion, a vignette is under prep, and first PR on that will come soon. With that users would be able to convert Seurat Visium and image-based ST (for now Vizgen and Xenium) objects.

TODO (from top to low priority):

  • add all colGeometries and rowGeometries specific to each altExp?
  • modify Voyager::plotSpatialFeature to get colGeometries (& rowGeometries) from altExp of sfe object
  • add support for Seurat CosMx to SFE conversion
  • considering to add image when Seurat -> SFE. However, Seurat has image info only for Visium. Alternatively for imST techs, path to an image can be provided when coercing to SFE.
  • backwards coercion SFE -> Seurat with setAs then as(sfe, "Seurat") or new S4 method toSeurat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
advanced documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants