Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronize flowCore_PnRmax handling between cytolib and flowCore FCS parsers #42

Merged
merged 6 commits into from
Sep 22, 2020

Conversation

jacobpwagner
Copy link
Member

@jacobpwagner jacobpwagner commented Sep 17, 2020

See RGLab/flowWorkspace#348 for discussion of the different flowCore and cytolib handling of flowCore_PnRmax values in the FCS keywords. This commit is intended to rectify that, but because it involves a low-level change in the parser I thought it might be good to allow review/veto before merging it in. After this commit, all flowCore, flowWorkspace, and CytoML tests (including internal) still pass, but the handling of instrument range is now more consistent between flowFrame and cytoframe:

> library(flowCore)
> library(flowWorkspace)
As part of improvements to flowWorkspace, some behavior of
GatingSet objects has changed. For details, please read the section
titled "The cytoframe and cytoset classes" in the package vignette:

  vignette("flowWorkspace-Introduction", "flowWorkspace")
> library(CytoExploreRData)
> 
> gs <- load_gs(system.file("extdata/Activation-GatingSet", package = "CytoExploreRData"),
+               backend_readonly = FALSE)
> 
> cf <- gh_pop_get_data(gs[[1]])
> 
> range(cf, "data")
       FSC-A  FSC-H    FSC-W     SSC-A  SSC-H     SSC-W Alexa Fluor 488-A       PE-A PE-Texas Red-A    7-AAD-A  PE-Cy7-A
min   3840.9   5008  48962.7    812.43    992  48231.07        -0.9284211 -0.2008877     -0.4126023 -0.2468764 -1.286891
max 262143.0 258475 213015.5 262143.00 257528 240257.84         4.4974089  4.4947143      4.1895137  4.4887066  4.344767
    Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min         -1.011764         -1.249842 -0.7027928        -0.1020277         -1.193219 -0.748190   54.7
max          4.499980          4.499548  4.4908552         4.4999361          4.408533  4.378348 6312.3
> range(cf, "instrument")
     FSC-A  FSC-H  FSC-W  SSC-A  SSC-H  SSC-W Alexa Fluor 488-A      PE-A PE-Texas Red-A   7-AAD-A  PE-Cy7-A Alexa Fluor 405-A
min      0      0      0      0      0      0         0.3440274 0.5447587      0.7766721 0.7676063 0.7410693         0.2184961
max 262143 262143 262143 262143 262143 262143         4.5000000 4.5000000      4.5000000 4.5000000 4.5000000         4.5000000
    Alexa Fluor 430-A Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min        -0.1468131  0.3265916         0.3060549         0.5694999 0.5042382      0
max         4.5000000  4.5000000         4.5000000         4.5000000 4.5000000 262143
> parameters(cf)$maxRange
 [1] 262143.0 262143.0 262143.0 262143.0 262143.0 262143.0      4.5      4.5      4.5      4.5      4.5      4.5      4.5
[14]      4.5      4.5      4.5      4.5 262143.0
> 
> fr <- cytoframe_to_flowFrame(cf)
> range(fr, "data")
       FSC-A  FSC-H    FSC-W     SSC-A  SSC-H     SSC-W Alexa Fluor 488-A       PE-A PE-Texas Red-A    7-AAD-A  PE-Cy7-A
min   3840.9   5008  48962.7    812.43    992  48231.07        -0.9284211 -0.2008877     -0.4126023 -0.2468764 -1.286891
max 262143.0 258475 213015.5 262143.00 257528 240257.84         4.4974089  4.4947143      4.1895137  4.4887066  4.344767
    Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min         -1.011764         -1.249842 -0.7027928        -0.1020277         -1.193219 -0.748190   54.7
max          4.499980          4.499548  4.4908552         4.4999361          4.408533  4.378348 6312.3
> range(fr, "instrument")
     FSC-A  FSC-H  FSC-W  SSC-A  SSC-H  SSC-W Alexa Fluor 488-A      PE-A PE-Texas Red-A   7-AAD-A  PE-Cy7-A Alexa Fluor 405-A
min      0      0      0      0      0      0         0.3440274 0.5447587      0.7766721 0.7676063 0.7410693         0.2184961
max 262143 262143 262143 262143 262143 262143         4.5000000 4.5000000      4.5000000 4.5000000 4.5000000         4.5000000
    Alexa Fluor 430-A Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min        -0.1468131  0.3265916         0.3060549         0.5694999 0.5042382      0
max         4.5000000  4.5000000         4.5000000         4.5000000 4.5000000 262143
> parameters(fr)$maxRange
 [1] 262143.0 262143.0 262143.0 262143.0 262143.0 262143.0      4.5      4.5      4.5      4.5      4.5      4.5      4.5
[14]      4.5      4.5      4.5      4.5 262143.0
> 
> cf_back <- flowFrame_to_cytoframe(fr)
> range(cf_back, "data")
       FSC-A  FSC-H    FSC-W     SSC-A  SSC-H     SSC-W Alexa Fluor 488-A       PE-A PE-Texas Red-A    7-AAD-A  PE-Cy7-A
min   3840.9   5008  48962.7    812.43    992  48231.07        -0.9284211 -0.2008877     -0.4126023 -0.2468764 -1.286891
max 262143.0 258475 213015.5 262143.00 257528 240257.84         4.4974089  4.4947143      4.1895137  4.4887066  4.344767
    Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min         -1.011764         -1.249842 -0.7027928        -0.1020277         -1.193219 -0.748190   54.7
max          4.499980          4.499548  4.4908552         4.4999361          4.408533  4.378348 6312.3
> range(cf_back, "instrument")
     FSC-A  FSC-H  FSC-W  SSC-A  SSC-H  SSC-W Alexa Fluor 488-A       PE-A PE-Texas Red-A    7-AAD-A  PE-Cy7-A
min      0      0      0      0      0      0        -0.9284211 -0.2008877     -0.4126023 -0.2468764 -1.286891
max 262143 262143 262143 262143 262143 262143         4.5000000  4.5000000      4.5000000  4.5000000  4.500000
    Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min         -1.011764         -1.249842 -0.7027928        -0.1020277         -1.193219  -0.74819      0
max          4.500000          4.500000  4.5000000         4.5000000          4.500000   4.50000 262143
> parameters(cf_back)$maxRange
 [1] 262143.0 262143.0 262143.0 262143.0 262143.0 262143.0      4.5      4.5      4.5      4.5      4.5      4.5      4.5
[14]      4.5      4.5      4.5      4.5 262143.0
> 
> tmp <- tempfile()
> write.FCS(fr, tmp)
[1] "/tmp/RtmpCBmJwt/file69d062238d6c"
> fr_back <- read.FCS(tmp)
> range(fr_back, "data")
       FSC-A  FSC-H    FSC-W     SSC-A  SSC-H     SSC-W Alexa Fluor 488-A       PE-A PE-Texas Red-A    7-AAD-A  PE-Cy7-A
min   3840.9   5008  48962.7    812.43    992  48231.07        -0.9284211 -0.2008877     -0.4126023 -0.2468764 -1.286891
max 262143.0 258475 213015.5 262143.00 257528 240257.84         4.4974089  4.4947143      4.1895137  4.4887066  4.344767
    Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min         -1.011764         -1.249842 -0.7027928        -0.1020277         -1.193219 -0.748190   54.7
max          4.499980          4.499548  4.4908552         4.4999361          4.408533  4.378348 6312.3
> range(fr_back, "instrument")
     FSC-A  FSC-H  FSC-W  SSC-A  SSC-H  SSC-W Alexa Fluor 488-A       PE-A PE-Texas Red-A    7-AAD-A  PE-Cy7-A
min      0      0      0      0      0      0        -0.9284211 -0.2008877     -0.4126023 -0.2468764 -1.286891
max 262143 262143 262143 262143 262143 262143         4.5000000  4.5000000      4.5000000  4.5000000  4.500000
    Alexa Fluor 405-A Alexa Fluor 430-A Qdot 605-A Alexa Fluor 647-A Alexa Fluor 700-A APC-Cy7-A   Time
min         -1.011764         -1.249842 -0.7027928        -0.1020277         -1.193219  -0.74819      0
max          4.500000          4.500000  4.5000000         4.5000000          4.500000   4.50000 262143
> parameters(fr_back)$maxRange
 [1] 262143.0 262143.0 262143.0 262143.0 262143.0 262143.0      4.5      4.5      4.5      4.5      4.5      4.5      4.5
[14]      4.5      4.5      4.5      4.5 262143.0

@jacobpwagner jacobpwagner changed the title Range sync Synchronize flowCore_PnRmax handling between cytolib and flowCore FCS parsers Sep 17, 2020
@jacobpwagner
Copy link
Member Author

jacobpwagner commented Sep 17, 2020

Notes from offline discussion on the usage of the "transformation" and "PnRmax" keywords:

It seems like the ideal logic was this:

  • Not changing PnR during transformation, so as to preserve original instrument range
  • For transformations performed by the FCS parser, the "transformation" keyword gets "applied"
  • For transformations performed by transform methods later, the "transformation" keyword gets "custom"
  • In either case, if a transformation is applied "PnRmax" gets added to keep track of the transformed range max.

However, that does not seem to be consistently applied. In particular, there are a couple of issues:

  • Transform called on a GatingSet may not be adding the "transformation" keyword or "PnRmax" to the individual cytoframes. I need to investigate/test.
  • write.FCS is not adding "transformation" or "PnRmax" if they are not already present.
  • ^That's a problem because flowFrame_to_cytoframe goes through write.FCS/load_cytoframe_from_fcs and so can lose the max range. That's also currently a problem for the cytoset->cytoframe workaround discussed here

@jacobpwagner
Copy link
Member Author

It looks like the main issue here might just be that transform does not add the keywords "transformation" = "custom" nor "flowCore_$PnRmin/max" when called on a GatingSet.

When called on a cytoset or cytoframe, it is added as the call goes through flowCore::transform based on inheritance from flowFrame:

flowWorkspace::transform(cytoset) -> flowCore::transform(flowFrame) -> flowCore::updateTransformationKeywords

For GatingSet that is not done at the R level, or the Rcpp level here or here, or at the cytolib level.

The quick solution would probably be to just add the assignment of the keyword at the R level for GatingSet. However, given that cytolib should be able to work as a standalone library and as these keywords are used in I/O at the cytolib level, it might be more robust in the long run to add them in the cytolib level transformation methods.

Along the way, it would probably be a good idea to make a dedicated flowWorkspace::transform(cytoframe) method for the sake of consistency. Currently, if a cytoframe is transformed via flowWorkspace::transform(GatingSet), it will ultimately use the cytolib level transform::transforming logic, as expected. However, if a cytoframe is transformed using flowWorkspace::transform(cytoframe), it will ultimately be transformed entirely at the R level in flowCore followed by the transformed data being re-assigned by exprs<-. On top of potentially resulting in discrepancies if the flowCore and cytolib transformation logic ever diverges, this is also probably less efficient.

@jacobpwagner
Copy link
Member Author

@mikejiang, @gfinak , based on the points in the comment above, to resolve the inconsistences and improve efficiency, I propose:

  • Moving the "transformation" = "custom" assignment logic down to the cytolib level transformation methods
  • Adding a dedicated transform method in flowWorkspace for cytoframe to make it converge on the cytolib transformation methods.

This would be instead of the original changes in this PR.

@mikejiang
Copy link
Member

Dedicated transform(cytoframe) method sounds a good idea, but make sure it also supports the arbitrary R transformer, similar to what's implemented in transform(gs) method

@jacobpwagner
Copy link
Member Author

42952b5 and RGLab/flowWorkspace@0ea4fe5 change behavior so:

  1. Transformation of cytoframes will now be performed at the cytolib level for all supported transformers, otherwise at the R level.
  2. In all cases (cytoframe/cytoset, GatingHierarchy/GatingSet) transformation using transform will set the keyword transformation=="custom" so that flowCore_$PnRmax is honored upon write out followed by read in.

@DillonHammill
Copy link

@jacobpwagner, does this mean that data transformations will get a speed boost too?

@jacobpwagner
Copy link
Member Author

@DillonHammill , possibly, with a few caveats:

  1. These changes aren't finalized yet
  2. This is only switching transformation from R->C++ for cytoframes that are not part of a GatingSet. GatingSet transformations were already using the cytolib-level transformation logic when possible. However, I think that might be a common use case for you, so you may see a speedup.
  3. Just like for GatingSet, it will only use the cytolib transformation logic if you are passing a transformerList and all of the transformations are supported at the cytolib level. (e.g. logtGml2_trans, logicle_trans, flowjo_biexp_trans, asinhtGml2_trans, logicleGml2_trans)

@DillonHammill
Copy link

Great thanks @jacobpwagner!

@jacobpwagner
Copy link
Member Author

@DillonHammill , it would seem there is a noticeable speedup. Not all that surprising I suppose, but good confirmation:

library(flowCore)
library(flowWorkspace)
library(microbenchmark)

cf <- load_cytoframe_from_fcs(list.files(system.file("extdata", package = "flowWorkspaceData"), pattern = "CytoTrol", full.names = TRUE)[1])


translist <- list(logtGml2_trans(), logicle_trans(), flowjo_biexp_trans(), asinhtGml2_trans(), logicleGml2_trans())
# Simple functions in transformList will use flowFrame transform method in flowCore
translist1 <- transformList(colnames(cf)[5:9], sapply(translist, function(x) x$transform))
# Fully-supported transformerList will use cytolib
translist2 <- transformerList(colnames(cf)[5:9], translist)


microbenchmark(flowCore = transform(cf1, translist1),
               cytolib = transform(cf1, translist2),
               times = 10,
               setup =cf1<-realize_view(cf))

The timing output

Unit: milliseconds
     expr       min        lq      mean    median        uq       max neval
 flowCore 119.48810 120.07523 177.73162 128.88807 149.07506 467.24919    10
  cytolib  69.79249  70.55988  74.01159  74.79106  75.64105  81.81152    10

@DillonHammill
Copy link

That's excellent @jacobpwagner! The transformations are by far the most computationally taxing steps for me, so any speed improvements will not go unnoticed!

@jacobpwagner jacobpwagner merged commit 174c162 into master Sep 22, 2020
@jacobpwagner jacobpwagner deleted the range_sync branch September 22, 2020 00:43
mikejiang pushed a commit to RGLab/flowWorkspace that referenced this pull request Oct 9, 2020
mikejiang pushed a commit to RGLab/flowWorkspace that referenced this pull request Oct 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants