Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure masks round-trip from Spectrum1D (or other nddata-derived objects) #23

Open
eteq opened this issue Aug 13, 2020 · 3 comments
Open
Labels
bug Something isn't working enhancement New feature or request question Further information is requested

Comments

@eteq
Copy link
Collaborator

eteq commented Aug 13, 2020

This issue is an evolution of #19 (see in particular #19 (review)), and #22 is basically the test that any PR solving this issue should start from (test-driven development FTW!)

However, there's a complexity here. As @nmearl pointed out to me in a Slack conversation:

Round tripping masks was never the intention in the earlier development of the translator because we wanted the mask of the resulting spectrum object to be defined by the subsets, so we never had to ask the question of what glue is supposed to do with the mask part of an incoming spectrum object (and thus about what it means when it comes out the other end), so this should definitely be a separate issue.

Hence this issue!

My stance is that it's absolutely critical for some science workflows that glue be aware of the masks that are provided in both Spectrum1D and other image-oriented NDData objects. Masks are used in many parts of astronomy to indicate critical information like "this pixel is hot, so it's only useful if the flux in it is above a certain threshold" in addition to the more prosaic "this pixel is bad". That's why the NDData masks are meant to be "false/0 is good, and everything else is some kind of bad".

This has an impedance mismatch with the glue's concept of subsets, though, which are really strictly boolean-style masks. I don't know that this is a serious conflict in that we can probably get away with having glue think of nddata-style masks as boolean in the intended sense of "0 is not-masked, everything else is masked", and just leave it at that for now. But if the user passes in a mask and then wants to get that object back out again using the translators, it's critical that the get back the mask they put in.

It seemed to me like the solution should be to have the glue data object at import to add a mask attribute, which is then just used to populate the mask on output like for uncertainty (a la #19). But then the question is what to do with existing subsets. There could be some straightforward mapping like "if a subset exists, it overwrites the mask" or possibly "the new mask is the union of the subset mask and the existing mask"?

cc @astrofrog @rosteen

@eteq eteq added enhancement New feature or request bug Something isn't working question Further information is requested labels Aug 13, 2020
@maartenbreddels
Copy link

I am not sure if completely follow the discussion, I went over it a few times. But let me share my view of masks.

I think if ndarrays's with masks are added to glue, they should stay as masked arrays. This means that if you request the data again, it's again masked as it was. Glue should then decide when a subset exists, how to combine the subset mask and the data mask.

I'm not sure actually how glue handles masked data, to be honest.

@astrofrog
Copy link
Member

@maartenbreddels the difficulty is that ‘masks’ in nddata are not Boolean but rather could also include integer bit masks or even strings... I’m not sure if there is a well defined way to know what is valid vs not valid in an nddata object and I think it is left up to the user facing application to decide this? (@eteq let me know if this isn’t the case and there is a universal way to specify invalid data)

@dhomeier
Copy link
Contributor

I am coming back to this since in the context of spacetelescope/jdaviz#2012 and astropy/specutils#1009 – is this request still of interest, an if so , should round-trip include writing to FITS and reading back?

The reference case of #22 is relatively easily solved by having

if subset_state is None:
mask = None

check at that point for the presence of data['mask'] and use that. to_data is already parsing the mask from Spectrum1D if not None, so it is available in data.components in that case.
This still leaves the question how to deal with the different concepts of a mask in glue Data – to actually use it there, one may have to invert what the parser does above, using ~spectrum.mask.astype(bool).
And then what happens to the spectrum mask if data has also a mask defined from a subset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants