Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DIPjavaio to handle files with multiple images #156

Open
crisluengo opened this issue May 2, 2024 · 7 comments
Open

DIPjavaio to handle files with multiple images #156

crisluengo opened this issue May 2, 2024 · 7 comments
Labels
component:DIPjavaio About DIPjavaio or any of its interfaces enhancement

Comments

@crisluengo
Copy link
Member

In Bio-Formats, reader.getSeriesCount() will return the number of images in the file, and reader.setSeries(i) will configure the reader to start reading the image number i.

We could add a parameter to dip::ImageReadJavaIO():

FileInformation ImageReadJavaIO(
      Image& out,
      String const& filename,
      String const& interface = bioformatsInterface,
      dip::uint imageNumber = 0
);

to indicate which image to read, and with a default value of 0. It being at the end is ugly but won't break code that currently uses this function.

dip::ImageReadTIFF() actually has a dip::Range parameter for the image number, and will concatenate all the images read. I'm not sure this is useful in the generic case of ImageReadJavaIO, which can deal with so many different file types. I'd rather write a new function that populates a std::vector of images. Right now I'm dealing with a CIF file that contains 50k tiny images, several Gb all together, I don't think it's a good idea to try to read that in one go. But on the other hand, some of these multi-image file formats don't have an index that points to each image, and the reader has to pass through all images before the one you want to read (see for example TIFF). So calling a reader function for each image is terrible. For these cases, you really want to initialize the reader, and return an iterator over images. But then the API starts to become quite complex...

Oh, OK, setSeries() should be fast. If so, opening the file could be slow? It's probably still more efficient to read many images in one go than calling dip::ImageReadJavaIO() for each image in a large file.

@crisluengo crisluengo added enhancement component:DIPjavaio About DIPjavaio or any of its interfaces labels May 2, 2024
@wcaarls
Copy link
Member

wcaarls commented May 2, 2024

I assume calling dip::ImageReadJavaIO() 50k times has significant overhead, but you can test it. The easiest, API-wise, is to do the same as TIFF, adding an extra dimension to the returned image and forcing all read image series to have the same dimensionality. If this is undesired, returning an std::vector is an option. We can call it dip::ImageReadJavaIOArray(). In both cases, accepting a dip::Range parameter.

Can you expand on you preference for returning a vector? Do you expect to read images with different dimensionalities/properties in other formats, but not in TIFF?

@crisluengo
Copy link
Member Author

TIFF can also have images of different sizes. But I've never seen a TIFF file with 50k images in it. Usually they're related, they're either slices of a 3D image, or they're scaled versions of the same image (a pyramid), etc. We can handle these cases well with the current TIFF reading code in DIPlib.

This CIF file has 50k images of a single cell each. Each image has 6 channels and is tiny (~80 pixels square), but they're all cut to a different size depending on the size of the cell. bfconvert spent almost two hours this morning extracting images and writing a TIFF file for each. That's obviously not doable.

BTW, I didn't post this issue to have you do the work, I just wanted to record my thoughts and hopefully get some good suggestions.

@crisluengo
Copy link
Member Author

crisluengo commented May 3, 2024

@wcaarls Take a look at what I did so far: 28d2fd34eaa509949eff744d1a3a45633a144a1b

@wcaarls
Copy link
Member

wcaarls commented May 3, 2024

If the images are different sizes, you would indeed need to return a vector somehow. The code looks good! How is the overhead of reading 50k images like that?

If the overhead is too high, another option (although perhaps not the best one) is to introduce state to the interface, where you first open the image, then read however many images you want, and then close it.

@crisluengo
Copy link
Member Author

This is indeed quite slow. Opening this file takes a second or so every time.

I think reading in a series of images as a vector, specified through a Range argument, will be the simplest solution. I really like the idea of a stateful reader (Bio-Formats itself is designed that way too) but that would probably be a much bigger effort to get right.

I'm thinking there's two options:

  1. To BioFormatsInterface add functions Open, SetSeries, ReadSeries, and Close. The current Read would call Open, ReadSeries and Close in succession. The new C++ function to read multiple images could call these functions individually, and put the images into a std::vector.
  2. To BioFormatsInterface add a function ReadMany, which reads images into some Java array object. The C++ interface would convert this Java array of images into a std::vector of images.

The issue with option 2 is that we'll be limited by the Java memory. In option 1, Java only reads one image at the time, so it won't be overwhelmed.

I have no idea which option is easier to implement... And I don't know if option 1 is most of the way towards the stateful reader?

@wcaarls
Copy link
Member

wcaarls commented May 7, 2024

I prefer option 1, which indeed is most of the way there to a stateful reader. The difference (and perhaps advantage) is that the user does not see the statefulness. To implement it, the reader object would become a member of the BioFormatsInterface class, such that it can be reused.

@crisluengo
Copy link
Member Author

Another advantage would be that we can easily implement a dip::ImageReadJavaIOInfo().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:DIPjavaio About DIPjavaio or any of its interfaces enhancement
Projects
None yet
Development

No branches or pull requests

2 participants