Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regarding multiple opens of the same h5 file #15

Open
mikejiang opened this issue Sep 27, 2019 · 3 comments
Open

regarding multiple opens of the same h5 file #15

mikejiang opened this issue Sep 27, 2019 · 3 comments

Comments

@mikejiang
Copy link
Member

mikejiang commented Sep 27, 2019

In ncdfFlow, we simply close h5 file handler immediately after each read/write operation. Thus we never experienced any issues. Now cytolib keeps the h5 file handler open during the life cycle of H5CytoFrame object to maintain the h5 cache for speeding up the subsequent IO.

This worked fine for multi-opens within the same process(e.g. the same R session) even if the file are opened with write flag

library(rhdf5)
file <- "/tmp/RtmpArlTHw/test.h5"
f1 <- H5Fopen(file, "H5F_ACC_RDWR")
f2 <- H5Fopen(file, "H5F_ACC_RDWR")
> f1
HDF5 FILE 
        name /
    filename 

      name       otype   dclass          dim
0 data     H5I_DATASET FLOAT    3002125 x 14
1 keywords H5I_DATASET COMPOUND 219         
2 params   H5I_DATASET COMPOUND 14          
3 pdata    H5I_DATASET COMPOUND 1

However when a separate process (e.g. another R session or command line tool) tries open the same file , it will fail on either H5F_ACC_RDONLY or H5F_ACC_RDWR flag since the file was locked by another h5 lib instance.

> f <- H5Fopen(file, "H5F_ACC_RDONLY")
Error in H5Fopen(file, "H5F_ACC_RDONLY") : 
  HDF5. File accessibilty. Unable to open file.

If the initial open was H5F_ACC_RDONLY, then it seems to succeed for both process. So I guess the inter-process lock was only applied when the file was opened with write permission.
Even though it makes sense for such locking mechanism, the behavior of allowing multi-opens within the same process is somewhat misleading. It could be that h5 lib inherently is designed for single-process application and thus no concurrent IOs are expected within the same process.
Yet the same assumption can't be hold when it comes to multi-process scenario, which is why h5 prohibits it.

Given the statement from H5 API specs

It is generally recommended that applications avoid multiple opens of the same file.

and also in our use cases, we can't guarantee the data (i.e. GatingSet) is always initially opened as read-only, the best we can do is follow ncdfFlow's practice by not maintaining the state of H5File handler.

@mikejiang mikejiang changed the title regarding multiple read access to the same h5 file regarding multiple opens of the same h5 file Sep 27, 2019
mikejiang pushed a commit that referenced this issue Nov 8, 2019
@mikejiang
Copy link
Member Author

Also need to prevent concurrent read to the same h5 during concurrent load_gs calls

> data("GvHD")
> gs <- GatingSet(GvHD[1:4])
> tmp <- tempfile()
> save_gs(gs, tmp)
Done
To reload it, use 'load_gs' function

> f <- function(i,path){
+   gs <- load_gs(path)
+   nrow(gh_pop_get_data(gs[[i]]))
+ }
> mclapply(1:4, f, path = tmp)
     error #000: in H5Fopen(): line 509
        major: File accessibilty
        minor: Unable to open file
     error #001: in H5F_open(): line 1567
        major: File accessibilty
        minor: Unable to open file
     error #002: in H5FD_lock(): line 1640
        major: Virtual File Layer
        minor: Can't update object
     error #003: in H5FD_sec2_lock(): line 959
        major: File accessibilty
        minor: Bad file ID accessed

Basically delay loading all the meta data from h5 until they are requested

mikejiang pushed a commit that referenced this issue Dec 6, 2019
mikejiang pushed a commit to RGLab/flowWorkspace that referenced this issue Dec 6, 2019
@mikejiang
Copy link
Member Author

With 50da439

> mclapply(1:4,  function(i){
+      gs <- load_gs(tmp, sel = i)
+       nrow(gh_pop_get_data(gs[[1]]))
+     })
[[1]]
[1] 3420

[[2]]
[1] 3405

[[3]]
[1] 3435

[[4]]
[1] 8550

@gfinak
Copy link
Member

gfinak commented Dec 6, 2019

beautiful..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants