Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training dataset #8

Open
nmhkahn opened this issue Oct 2, 2020 · 4 comments
Open

Training dataset #8

nmhkahn opened this issue Oct 2, 2020 · 4 comments
Labels
documentation Improvements or additions to documentation

Comments

@nmhkahn
Copy link

nmhkahn commented Oct 2, 2020

Hi. thanks for sharing a great dataset!

In the paper, it says "The training set consists of 240 LR and HR image sets, and the test set consists of 120 sets of images".
But in the data/normalized directory, I only can get 120 images, which might be the test set only.
I downloaded the raw dataset as well but I think that the dataset preparation code and the raw data are not matched (preparation code assumes the image as png, but raw data is npy files).

I also saw that there was a training dataset on this repo in an early commit (c674e02), but not sure that it is safe to use these old training images.
Please let me know that 1) training dataset from c674e02 is right and 2) if not, where can I get the training dataset.

@majedelhelou
Copy link
Member

Hello!

We originally had our own complex normalization strategy, tailored for SIM, but we received a lot of requests for raw data. Accordingly, we made the raw data public, and used the raw data with simpler normalizations. All the paper's results are on these raw data, which we used for training all our models. You can ignore the old data from earlier commits.

For Widefield: we applied z-score normalization. The z-score is computed across all 360x400 captures; avg_value = 154.535390853, std_value = 66.02846351802853
For SIM: the SIM image is normalized with a scaling and shift operation.
These two normalizations are presented in our Supplementary Material Section 3.

Regarding the preparation code, we now modified the notebook w2s/code/generate_h5f.ipynb to read npy instead of png, it is just a file-reading modification.

Thank you for your positive feedback. Please only use the raw data, and we hope it will be useful for your work.

@majedelhelou majedelhelou added the documentation Improvements or additions to documentation label Oct 2, 2020
@nmhkahn
Copy link
Author

nmhkahn commented Oct 3, 2020

Thanks for clarifying this!

@nmhkahn nmhkahn closed this as completed Oct 3, 2020
@nmhkahn
Copy link
Author

nmhkahn commented Oct 5, 2020

Hello @majedelhelou sorry for reopening the issue.

For LR (Widefield) images, it looks straightforward since it's just a z-score standardization as you commented.
But for HR (SIM) images, I have a couple of questions which are:

  1. Is calculating alpha and beta to be done in image-by-image?
    For example, do I have to calculate alpha and beta of sim/001_1.npy by only using avg1/001_1.npy when training on avg1->sim whereas avg2/001_1.npy when training on avg2->sim?
  2. In eq4 in suppl, the HR image is downsampled but which kernel was used in the paper?

Thanks again 👍

@nmhkahn nmhkahn reopened this Oct 5, 2020
@majedelhelou
Copy link
Member

Hello @nmhkahn

Yes correct for the LR Widefield images, we note in the readme the mean and standard deviation values we computed for the z-score across all the 400 samples times 360 FOVs.

  1. You should compute a and b (Eq 4 of Supplementary Material) per FOV. But you should compute them using the corresponding normalized LR image. So you can use the normalized avg400 to compute a and b for normalizing the corresponding sim. Using a noisy normalized LR (like avg1) would not affect the result too much, this is just a linear re-scaling that is practical for the deep networks.
  2. You can use the standard bicubic downsampling for this.

These points should not matter too much [the effect of using a noisy or noise-free LR Widefield image for normalizing the HR SIM, and the choice of the downsampling kernel] and you can also use your own strategies for normalizing the HR SIM. The goal is to get a somewhat more uniform intensity distribution for the networks, and small fluctuations in how we get them should not have a significant impact.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants