Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when reading FCS file #235

Open
schraivo opened this issue Aug 25, 2022 · 9 comments
Open

Error when reading FCS file #235

schraivo opened this issue Aug 25, 2022 · 9 comments

Comments

@schraivo
Copy link

schraivo commented Aug 25, 2022

Describe the bug
When trying to read an FCS file, read.FCS throws an error:

Error in if (any(idx)) { : missing value where TRUE/FALSE needed

I can not detect anything wrong with the file. FlowJo opens the file without issues. Already checked this issue, which looks similar, but couldn't reproduce the solution: #123

To Reproduce
Download FCS file: https://oc.embl.de/index.php/s/F0rTI92aOkzofWc

library(flowCore)
read.FCS("pathtodirectory/sync_late_pS10_sampleA_100k.fcs")

Expected behavior
There should be no error.

sessionInfo():

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /g/easybuild/x86_64/CentOS/7/haswell/software/FlexiBLAS/3.0.4-GCC-11.2.0/lib64/libflexiblas.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocManager_1.30.17 BiocGenerics_0.42.0 forcats_0.5.1       stringr_1.4.0       dplyr_1.0.8         purrr_0.3.4         readr_2.1.2         tidyr_1.2.0        
 [9] tibble_3.1.6        ggplot2_3.3.5       tidyverse_1.3.1     flowCore_2.8.0     

loaded via a namespace (and not attached):
 [1] matrixStats_0.62.0 fs_1.5.2           usethis_2.1.5      lubridate_1.8.0    devtools_2.4.3     httr_1.4.2         rprojroot_2.0.3    tools_4.2.0       
 [9] backports_1.4.1    utf8_1.2.2         R6_2.5.1           DBI_1.1.2          colorspace_2.0-3   withr_2.5.0        tidyselect_1.1.2   prettyunits_1.1.1 
[17] processx_3.5.3     curl_4.3.2         compiler_4.2.0     cli_3.3.0          rvest_1.0.2        Biobase_2.56.0     xml2_1.3.3         desc_1.4.1        
[25] scales_1.2.0       callr_3.7.0        digest_0.6.29      rmarkdown_2.14     pkgconfig_2.0.3    htmltools_0.5.2    sessioninfo_1.2.2  dbplyr_2.1.1      
[33] fastmap_1.1.0      rlang_1.0.2        readxl_1.4.0       rstudioapi_0.13    generics_0.1.2     jsonlite_1.8.0     magrittr_2.0.3     RProtoBufLib_2.8.0
[41] Rcpp_1.0.8.3       munsell_0.5.0      S4Vectors_0.34.0   fansi_1.0.3        lifecycle_1.0.1    stringi_1.7.6      yaml_2.3.5         brio_1.1.3        
[49] pkgbuild_1.3.1     grid_4.2.0         crayon_1.5.1       haven_2.5.0        hms_1.1.1          knitr_1.38         ps_1.7.0           pillar_1.7.0      
[57] stats4_4.2.0       pkgload_1.2.4      reprex_2.0.1       glue_1.6.2         evaluate_0.15      remotes_2.4.2      RcppParallel_5.1.5 modelr_0.1.8      
[65] vctrs_0.4.1        tzdb_0.3.0         testthat_3.1.3     cellranger_1.1.0   gtable_0.3.0       assertthat_0.2.1   cachem_1.0.6       xfun_0.30         
[73] broom_0.8.0        cytolib_2.8.0      memoise_2.0.1      ellipsis_0.3.2
@SamGG
Copy link
Contributor

SamGG commented Aug 25, 2022

flowCore is carrying out checks that FJ does not. The error seems to be located where read.FCS is checking whether the measurements are in their range or not (global min and channel max). This test seems to occur when there is no transformation set (but still unsure).
As a rule, I always set the following when using read.FCS transformation = FALSE, emptyValue = TRUE, min.limit = NULL, truncate_max_range = FALSE. I think this could help while waiting for a better answer from Mike or Greg.

flowCore/R/IO.R

Lines 1230 to 1258 in ba3b6ff

if(is.na(x["transformation"]))
{
if(truncate_max_range){
for(i in seq_len(ncol(dat)))
{
idx <- dat[,i]>range[i]
if(any(idx))
{
warning("Some data values of '", cn[i], "' channel exceed its $PnR value ", range[i], " and will be truncated!"
, "\n To avoid truncation, either fix $PnR before generating FCS or set 'truncate_max_range = FALSE'")
dat[idx,i] <- range[i]
}
}
}
if(!is.null(min.limit))
{
idx <- dat<min.limit
if(any(idx))
{
warning("Some data value are below 'min.limit' ", min.limit, " and will be truncated"
, "\n To avoid truncation, set 'min.limit = NULL'")
dat[idx] <- min.limit
}
}
}

@schraivo
Copy link
Author

Hi Sam, the error is gone with truncate_max_range = FALSE. Thanks! Would be curious to know which values were out of range, in order to understand the error. Let's see what Mike or Greg think.

@gfinak
Copy link
Member

gfinak commented Aug 25, 2022

Your Time channel has a reported range of 9.223372e+18 but data in your time channel go up to 3.378476e+38 and some values are NA.
We tend to keep things pretty strict about reading valid FCS files where the data make sense and leave it up to the user to knowingly shoot themselves in the foot. That said the messaging here could be much better, and this should be a warning not an error. The bug arises because we aren't treating the unexpected NA values properly when trying to truncate with truncate_max_range=TRUE. We'll get a fix in.

@gfinak
Copy link
Member

gfinak commented Aug 25, 2022

Noting here until I can check out the repo:

Error is in this secton:
any(idx) returns NA when idx contains NA.
should likely be:
any(na.omit(idx))
same for the section with min.limit

flowCore/R/IO.R

Lines 1234 to 1258 in 14781e7

{
idx <- dat[,i]>range[i]
if(any(idx))
{
warning("Some data values of '", cn[i], "' channel exceed its $PnR value ", range[i], " and will be truncated!"
, "\n To avoid truncation, either fix $PnR before generating FCS or set 'truncate_max_range = FALSE'")
dat[idx,i] <- range[i]
}
}
}
if(!is.null(min.limit))
{
idx <- dat<min.limit
if(any(idx))
{
warning("Some data value are below 'min.limit' ", min.limit, " and will be truncated"
, "\n To avoid truncation, set 'min.limit = NULL'")
dat[idx] <- min.limit
}
}
}

@SamGG
Copy link
Contributor

SamGG commented Aug 25, 2022

For curiosity and if you have time, how a NA results from a FCS file?

@gfinak
Copy link
Member

gfinak commented Aug 25, 2022

@SamGG a great question:

length(which(is.na(dat[,i])))
[1] 528

for that time channel.
The data already contains NA after
the readBin call
parameters passed to that are:

readBin(con = con, what = "numeric", n = 17453527, size = 4, signed = TRUE, endian = "little")

And it's not just that it's trying to read more than it should... the length of the read data is 17453527 and the max index that's NA is 15418653.
Probably another instrument maker writing bad files.
If this happens with the time channel it's a wonder anybody can trust such data.

@schraivo
Copy link
Author

It's an FCS file from an instrument prototype. Happy to continue discussing offline, as this will include confidential information. Thanks everyone for your support and quick responses.

@SamGG
Copy link
Contributor

SamGG commented Aug 25, 2022

@gfinak bingo, you identified a new comer!
@schraivo NA are precisely NaN; devel should take care when converting 64 bits integers; please close the issue and remove your link if you feel so.

@gfinak
Copy link
Member

gfinak commented Aug 25, 2022

I still think we should test for NA and error more informatively. Will reopen until I push a fix.

@gfinak gfinak reopened this Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants