Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gs_pop_get_count_fast() freq extract not matching FlowJo Pop Freq Table Export #369

Open
miosisoniii opened this issue Feb 4, 2022 · 1 comment

Comments

@miosisoniii
Copy link

I am trying to replicate the Population Frequency Statistics calculated by FlowJo using the Population proportions extracted by FlowWorkspace from XML but for some reason basic conversion to percentage from the proportion and rounding with significant figures does not match the output from the FlowJo table export.

To perform this manual export of the Population Frequency from FlowJo, I go to Table Editor and select Create Table. The presumably default Table that is exported contains the Population Frequency in a Percent, which is rounded to 3 significant figures (My FlowJo Decimal Precision is set to 2, and my Significant Figures is set to 3). Below is the table (the first column are sensitive sample ID's.
image

Using FlowWorkspace/OpenCyto I export the frequency table using the "freq" and "wide" and transpose it to match the format of the Table that is exported from FlowJo:

library(openCyto)
library(flowWorkspace)
library(CytoML)
#> Warning: package 'CytoML' was built under R version 3.6.3
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tibble)
library(reprex)

# not used for example...
# open workspace and convert to gs
# wsp <- CytoML::open_flowjo_xml(path)
# gs <- wsp %>% flowjo_to_gatingset(name = "All Samples")
# freq_gs <- flowWorkspace::gs_pop_get_count_fast(gs, statistic = "freq", format = "wide")
# transpose to get into table format thats similar to flowjo export
# t <- freq_gs %>% t() %>% as.data.frame() %>% dplyr::select(-root)
# match same fcs file name as FlowJo export
# t <- t %>% tibble::rownames_to_column("V1")
# saved table to csv
# write.csv(t, "~/projects/opencyto/cyto_ex.csv", row.names = FALSE)

Created on 2022-02-04 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value
#>  version  R version 3.6.2 (2019-12-12)
#>  os       Windows 10 x64 (build 18363)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United States.1252
#>  ctype    English_United States.1252
#>  tz       America/New_York
#>  date     2022-02-04
#>  pandoc   2.14.0.3 @ C:/Program Files/RStudio/bin/pandoc/ (via rmarkdown)
#> 
#> - Packages -------------------------------------------------------------------
#>  ! package       * version  date (UTC) lib source
#>  1 assertthat      0.2.1    2019-03-21 [1] CRAN (R 3.6.3)
#>  1 backports       1.2.1    2020-12-09 [1] CRAN (R 3.6.3)
#>  1 base64enc       0.1-3    2015-07-28 [1] CRAN (R 3.6.0)
#>  1 Biobase         2.46.0   2019-10-29 [1] Bioconductor
#>  1 BiocGenerics    0.32.0   2019-10-29 [1] Bioconductor
#>  1 bitops          1.0-7    2021-04-24 [1] CRAN (R 3.6.3)
#>  1 cli             3.1.0    2021-10-27 [1] CRAN (R 3.6.2)
#>  1 clue            0.3-59   2021-04-16 [1] CRAN (R 3.6.3)
#>  1 cluster         2.1.0    2019-06-19 [2] CRAN (R 3.6.2)
#>  1 colorspace      2.0-1    2021-05-04 [1] CRAN (R 3.6.3)
#>  1 corpcor         1.6.10   2021-09-16 [1] CRAN (R 3.6.2)
#>  1 crayon          1.4.2    2021-10-29 [1] CRAN (R 3.6.2)
#>  1 CytoML        * 1.12.1   2020-03-26 [1] Bioconductor
#>  1 data.table      1.14.0   2021-02-21 [1] CRAN (R 3.6.3)
#>  1 DBI             1.1.2    2021-12-20 [1] CRAN (R 3.6.2)
#>  1 DEoptimR        1.0-10   2022-01-03 [1] CRAN (R 3.6.2)
#>  1 deSolve         1.28     2020-03-08 [1] CRAN (R 3.6.3)
#>  1 digest          0.6.27   2020-10-24 [1] CRAN (R 3.6.3)
#>  1 dplyr         * 1.0.7    2021-06-18 [1] CRAN (R 3.6.2)
#>  1 ellipse         0.4.2    2020-05-27 [1] CRAN (R 3.6.3)
#>  1 ellipsis        0.3.2    2021-04-29 [1] CRAN (R 3.6.3)
#>  1 evaluate        0.14     2019-05-28 [1] CRAN (R 3.6.3)
#>  1 fansi           0.4.2    2021-01-15 [1] CRAN (R 3.6.3)
#>  1 fastmap         1.1.0    2021-01-25 [1] CRAN (R 3.6.3)
#>  1 fda             5.5.1    2021-11-17 [1] CRAN (R 3.6.2)
#>  1 fds             1.8      2018-10-31 [1] CRAN (R 3.6.3)
#>  1 flowClust       3.24.0   2019-10-29 [1] Bioconductor
#>  1 flowCore        1.52.1   2019-12-04 [1] Bioconductor
#>  1 flowStats       3.44.0   2019-10-29 [1] Bioconductor
#>  1 flowViz         1.50.0   2019-10-29 [1] Bioconductor
#>  1 flowWorkspace * 3.34.1   2020-01-02 [1] Bioconductor
#>  1 fs              1.5.0    2020-07-31 [1] CRAN (R 3.6.3)
#>  1 generics        0.1.1    2021-10-25 [1] CRAN (R 3.6.2)
#>  1 ggcyto          1.14.1   2020-03-07 [1] Bioconductor
#>  1 ggplot2         3.3.5    2021-06-25 [1] CRAN (R 3.6.2)
#>  1 glue            1.4.2    2020-08-27 [1] CRAN (R 3.6.3)
#>  1 graph           1.64.0   2019-10-29 [1] Bioconductor
#>  1 gridExtra       2.3      2017-09-09 [1] CRAN (R 3.6.3)
#>  1 gtable          0.3.0    2019-03-25 [1] CRAN (R 3.6.3)
#>  1 gtools          3.8.2    2020-03-31 [1] CRAN (R 3.6.3)
#>  1 hdrcde          3.4      2021-01-18 [1] CRAN (R 3.6.3)
#>  1 hexbin          1.28.2   2021-01-08 [1] CRAN (R 3.6.3)
#>  1 highr           0.9      2021-04-16 [1] CRAN (R 3.6.3)
#>  1 htmltools       0.5.2    2021-08-25 [1] CRAN (R 3.6.2)
#>  1 IDPmisc         1.1.20   2020-01-21 [1] CRAN (R 3.6.3)
#>  1 jpeg            0.1-8.1  2019-10-24 [1] CRAN (R 3.6.1)
#>  1 jsonlite        1.7.2    2020-12-09 [1] CRAN (R 3.6.3)
#>  1 KernSmooth      2.23-16  2019-10-15 [2] CRAN (R 3.6.2)
#>  1 knitr           1.37     2021-12-16 [1] CRAN (R 3.6.2)
#>  1 ks              1.12.0   2021-02-07 [1] CRAN (R 3.6.3)
#>  1 lattice         0.20-38  2018-11-04 [2] CRAN (R 3.6.2)
#>  1 latticeExtra    0.6-29   2019-12-19 [1] CRAN (R 3.6.3)
#>  1 lifecycle       1.0.1    2021-09-24 [1] CRAN (R 3.6.2)
#>  1 magrittr        2.0.1    2020-11-17 [1] CRAN (R 3.6.3)
#>  1 MASS            7.3-51.4 2019-03-31 [2] CRAN (R 3.6.2)
#>  1 Matrix          1.2-18   2019-11-27 [2] CRAN (R 3.6.2)
#>  1 matrixStats     0.58.0   2021-01-29 [1] CRAN (R 3.6.3)
#>  1 mclust          5.4.7    2020-11-20 [1] CRAN (R 3.6.3)
#>  1 mnormt          2.0.2    2020-09-01 [1] CRAN (R 3.6.3)
#>  1 munsell         0.5.0    2018-06-12 [1] CRAN (R 3.6.3)
#>  1 mvtnorm         1.1-3    2021-10-08 [1] CRAN (R 3.6.2)
#>  1 ncdfFlow        2.32.0   2019-10-29 [1] Bioconductor
#>  1 openCyto      * 1.24.0   2019-10-29 [1] Bioconductor
#>  1 pcaPP           1.9-74   2021-04-23 [1] CRAN (R 3.6.3)
#>  1 pillar          1.6.4    2021-10-18 [1] CRAN (R 3.6.2)
#>  1 pkgconfig       2.0.3    2019-09-22 [1] CRAN (R 3.6.3)
#>  1 plyr            1.8.6    2020-03-03 [1] CRAN (R 3.6.3)
#>  1 png             0.1-7    2013-12-03 [1] CRAN (R 3.6.0)
#>  1 purrr           0.3.4    2020-04-17 [1] CRAN (R 3.6.3)
#>  1 R.cache         0.15.0   2021-04-30 [1] CRAN (R 3.6.3)
#>  1 R.methodsS3     1.8.1    2020-08-26 [1] CRAN (R 3.6.3)
#>  1 R.oo            1.24.0   2020-08-26 [1] CRAN (R 3.6.3)
#>  1 R.utils         2.11.0   2021-09-26 [1] CRAN (R 3.6.2)
#>  1 R6              2.5.1    2021-08-19 [1] CRAN (R 3.6.2)
#>  1 rainbow         3.6      2019-01-29 [1] CRAN (R 3.6.3)
#>  1 RBGL            1.62.1   2019-10-30 [1] Bioconductor
#>  1 RColorBrewer    1.1-2    2014-12-07 [1] CRAN (R 3.6.0)
#>  1 Rcpp            1.0.7    2021-07-07 [1] CRAN (R 3.6.2)
#>  2 RcppParallel    5.1.4    2021-05-04 [1] CRAN (R 3.6.3)
#>  1 RCurl           1.98-1.3 2021-03-16 [1] CRAN (R 3.6.3)
#>  1 reprex        * 2.0.1    2021-08-05 [1] CRAN (R 3.6.2)
#>  1 Rgraphviz       2.30.0   2019-10-29 [1] Bioconductor
#>  1 rlang           0.4.11   2021-04-30 [1] CRAN (R 3.6.3)
#>  1 rmarkdown       2.11     2021-09-14 [1] CRAN (R 3.6.2)
#>  1 robustbase      0.93-6   2020-03-23 [1] CRAN (R 3.6.3)
#>  1 rrcov           1.5-5    2020-08-03 [1] CRAN (R 3.6.3)
#>  1 rstudioapi      0.13     2020-11-12 [1] CRAN (R 3.6.3)
#>  1 scales          1.1.1    2020-05-11 [1] CRAN (R 3.6.3)
#>  1 sessioninfo     1.2.2    2021-12-06 [1] CRAN (R 3.6.2)
#>  1 stringi         1.6.1    2021-05-10 [1] CRAN (R 3.6.3)
#>  1 stringr         1.4.0    2019-02-10 [1] CRAN (R 3.6.3)
#>  1 styler          1.6.2    2021-09-23 [1] CRAN (R 3.6.2)
#>  1 tibble        * 3.1.6    2021-11-07 [1] CRAN (R 3.6.2)
#>  1 tidyselect      1.1.1    2021-04-30 [1] CRAN (R 3.6.3)
#>  1 tmvnsim         1.0-2    2016-12-15 [1] CRAN (R 3.6.0)
#>  1 utf8            1.2.1    2021-03-12 [1] CRAN (R 3.6.3)
#>  1 vctrs           0.3.8    2021-04-29 [1] CRAN (R 3.6.3)
#>  1 withr           2.4.3    2021-11-30 [1] CRAN (R 3.6.2)
#>  1 xfun            0.29     2021-12-14 [1] CRAN (R 3.6.2)
#>  1 XML             3.99-0.3 2020-01-20 [1] CRAN (R 3.6.3)
#>  1 yaml            2.2.1    2020-02-01 [1] CRAN (R 3.6.3)
#>  1 zlibbioc        1.32.0   2019-10-29 [1] Bioconductor
#> 
#>  [1] C:/Users/Artemio.Sison/Documents/R/win-library/3.6
#>  [2] C:/Program Files/R/R-3.6.2/library
#> 
#>  D -- DLL MD5 mismatch, broken installation.
#> 
#> ------------------------------------------------------------------------------

Created on 2022-02-04 by the reprex package (v2.0.1)

And then convert the proportion derived from FlowWorkspace/openCyto (which are really small decimals) to percent. Using the sigfig preference (3) in the freq.signif() function they are off by a significant amount, so I included another function freq.close() but the values seem to be off by 1/1000th or so:

Example files here (trimmed to look at only the values that are not matching, which happen to be fractions of a percent):
manual_ex.csv
cyto_ex.csv

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
man_ex <- read.csv("~/projects/opencyto/manual_ex.csv")
cyto_ex <- read.csv("~/projects/opencyto/cyto_ex.csv")

freq.signif <- function(x){signif(x*100, digits = 3)}
freq.close <- function(x){signif(signif(x*100, digits = 2), digits = 3)}

# manual table export from flowjo that i want to match
man_ex
#>     ID  STIM  TIME CD4.param1 CD4.param2 CD4.param3 CD8.param3 CD8.param4
#> 1 sub1 stim1 time1      0.350      0.460       0.28       0.21       1.46
#> 2 sub1 stim2 time1      0.380      0.450       0.36       0.24       1.54
#> 3 sub2 stim1 time1      0.160      0.260       0.45       0.17       2.23
#> 4 sub2 stim2 time1      0.077      0.073       0.66       0.34       2.37
# raw export from flowworkspace
cyto_ex
#>     ID  STIM  TIME   CD4.param1   CD4.param2  CD4.param3  CD8.param3 CD8.param4
#> 1 sub1 stim1 time1 0.0034525865 0.0045836062 0.002797786 0.002065787 0.01461942
#> 2 sub1 stim2 time1 0.0038029193 0.0044740227 0.003635143 0.002427531 0.01542196
#> 3 sub2 stim1 time1 0.0015723645 0.0025729601 0.004502680 0.001694915 0.02227603
#> 4 sub2 stim2 time1 0.0007735448 0.0007251982 0.006647650 0.003414634 0.02365854

# attempt to replicate using flowjo sigfigs (=3)
dplyr::mutate_if(cyto_ex, is.numeric, freq.signif)
#>     ID  STIM  TIME CD4.param1 CD4.param2 CD4.param3 CD8.param3 CD8.param4
#> 1 sub1 stim1 time1     0.3450     0.4580      0.280      0.207       1.46
#> 2 sub1 stim2 time1     0.3800     0.4470      0.364      0.243       1.54
#> 3 sub2 stim1 time1     0.1570     0.2570      0.450      0.169       2.23
#> 4 sub2 stim2 time1     0.0774     0.0725      0.665      0.341       2.37
# applying signif function thats close, CD8.param4 is off by 1/1000th
dplyr::mutate_if(cyto_ex, is.numeric, freq.close)
#>     ID  STIM  TIME CD4.param1 CD4.param2 CD4.param3 CD8.param3 CD8.param4
#> 1 sub1 stim1 time1      0.350      0.460       0.28       0.21        1.5
#> 2 sub1 stim2 time1      0.380      0.450       0.36       0.24        1.5
#> 3 sub2 stim1 time1      0.160      0.260       0.45       0.17        2.2
#> 4 sub2 stim2 time1      0.077      0.073       0.66       0.34        2.4

Created on 2022-02-04 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value
#>  version  R version 3.6.2 (2019-12-12)
#>  os       Windows 10 x64 (build 18363)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United States.1252
#>  ctype    English_United States.1252
#>  tz       America/New_York
#>  date     2022-02-04
#>  pandoc   2.14.0.3 @ C:/Program Files/RStudio/bin/pandoc/ (via rmarkdown)
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date (UTC) lib source
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.3)
#>  backports     1.2.1   2020-12-09 [1] CRAN (R 3.6.3)
#>  cli           3.1.0   2021-10-27 [1] CRAN (R 3.6.2)
#>  crayon        1.4.2   2021-10-29 [1] CRAN (R 3.6.2)
#>  DBI           1.1.2   2021-12-20 [1] CRAN (R 3.6.2)
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 3.6.3)
#>  dplyr       * 1.0.7   2021-06-18 [1] CRAN (R 3.6.2)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 3.6.3)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.3)
#>  fansi         0.4.2   2021-01-15 [1] CRAN (R 3.6.3)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 3.6.3)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 3.6.3)
#>  generics      0.1.1   2021-10-25 [1] CRAN (R 3.6.2)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 3.6.3)
#>  highr         0.9     2021-04-16 [1] CRAN (R 3.6.3)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 3.6.2)
#>  knitr         1.37    2021-12-16 [1] CRAN (R 3.6.2)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 3.6.2)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 3.6.3)
#>  pillar        1.6.4   2021-10-18 [1] CRAN (R 3.6.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.3)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 3.6.3)
#>  R.cache       0.15.0  2021-04-30 [1] CRAN (R 3.6.3)
#>  R.methodsS3   1.8.1   2020-08-26 [1] CRAN (R 3.6.3)
#>  R.oo          1.24.0  2020-08-26 [1] CRAN (R 3.6.3)
#>  R.utils       2.11.0  2021-09-26 [1] CRAN (R 3.6.2)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 3.6.2)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 3.6.2)
#>  rlang         0.4.11  2021-04-30 [1] CRAN (R 3.6.3)
#>  rmarkdown     2.11    2021-09-14 [1] CRAN (R 3.6.2)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 3.6.3)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 3.6.2)
#>  stringi       1.6.1   2021-05-10 [1] CRAN (R 3.6.3)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.3)
#>  styler        1.6.2   2021-09-23 [1] CRAN (R 3.6.2)
#>  tibble        3.1.6   2021-11-07 [1] CRAN (R 3.6.2)
#>  tidyselect    1.1.1   2021-04-30 [1] CRAN (R 3.6.3)
#>  utf8          1.2.1   2021-03-12 [1] CRAN (R 3.6.3)
#>  vctrs         0.3.8   2021-04-29 [1] CRAN (R 3.6.3)
#>  withr         2.4.3   2021-11-30 [1] CRAN (R 3.6.2)
#>  xfun          0.29    2021-12-14 [1] CRAN (R 3.6.2)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 3.6.3)
#> 
#>  [1] C:/Users/Artemio.Sison/Documents/R/win-library/3.6
#>  [2] C:/Program Files/R/R-3.6.2/library
#> 
#> ------------------------------------------------------------------------------

After reaching out to FlowJo directly:
image

Is this an issue of including (or not including) the last values of these really small decimals? Or could it be an artifact stemming from the parsing of the wsp/XML counts?

Unfortunately my organization will not be updating to R4.0 for awhile so I apologize for not having the cleanest reprex. I want to believe that the inability to calculate the same exact values as FlowJo is not because of the version. You should be able to recreate this easily by exporting a Freq. Table from FlowJo and comparing it to the FlowWorkspace exported Freq. Table and converting the proportion to a percent.

FlowJo Version: 10.5.3
FlowJo Engine: v4.00770
OS: Windows 10
Java Version: 1.8.0_161-b12
Build Number 10.5.3

@miosisoniii miosisoniii changed the title gs_pop_get_stat_fast() freq extract not matching FlowJo Pop Freq Table Export gs_pop_get_count_fast() freq extract not matching FlowJo Pop Freq Table Export Feb 8, 2022
@mikejiang
Copy link
Member

In my opinion, this amount of difference is expected, since flowWorkspace does its own gating independently from flowjo based on that gates parsed from XML, you can verified the difference of cell count

gh_pop_compare_stats(gs[[1]])

If you want to have the exact same stats from flowjo, simply ask for flowjo stats from this API that is

gs_pop_get_count_fast(gs, statistic = "freq", format = "wide",xml = TRUE)               

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants