Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement long-term fix for NIRSpec MOS source ID handling #7287

Open
stscijgbot-jp opened this issue Oct 15, 2022 · 22 comments · May be fixed by #8442
Open

Implement long-term fix for NIRSpec MOS source ID handling #7287

stscijgbot-jp opened this issue Oct 15, 2022 · 22 comments · May be fixed by #8442

Comments

@stscijgbot-jp
Copy link
Collaborator

Issue JP-2922 was created on JIRA by Howard Bushouse:

Implement the long-term solution to handling NIRSpec MOS source ID's that are negative or greater than 5 digits in length, as outlined in JWSTDMS-709.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

Discussion regarding implementation is being tracked at https://innerspace.stsci.edu/display/jwstdms/2022-06-27+Discuss+DMS+usage+of+PPS+source+ids

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

Jonathan Eisenhamer  Starting to think about how we would implement the change to designate different kinds of sources (i.e. source, background, virtual) in the the file name syntax. After doing some experimenting I see that it would be simple to modify the associations.lib.rules_level3_base.format_product() function at https://github.com/spacetelescope/jwst/blob/master/jwst/associations/lib/rules_level3_base.py#L596 to unhardwire the "s" prefix on all formatted values of source_id and instead use an additional argument to that function (source_type?) to indicate the choice of "source", "background", or "virtual" and then have the appropriate "s", "b", or "v" prefix added to the front of the source_id number.

The initial creation of the source-based product names for spectra takes place in the calwebb_spec3 pipeline at  https://github.com/spacetelescope/jwst/blob/master/jwst/pipeline/calwebb_spec3.py#L206. this is where the source-based "cal" product names get created for each source/slitlet, and then the downstream "s2d" and "x1d" product names are copies from the "cal". It would be easy enough to add a check to this little piece of code that looks for negative source_id values, which it would then convert to positive numbers to store in the file name and choose the "v" source type in the call to format_product.  But what's not so simple is to determine (at that spot in the pipeline code) whether a given source/slitlet is background type. You can't tell it from the source_id value, because they get reassigned from an initial value of 0 in the MSA metadata to a positive value > the rest of the normal sources. The only way to tell is to dig a little more into the meta data for each source/slitlet and look at values of source_name that begin with "background_".

Thoughts?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Jonathan Eisenhamer on JIRA:

Concerning the naming templates: I would say, simply remove the hardwired "s" prefix, and that is it. There is no particular reason to further differentiate between "source_type" and "source_id" at this high of level. Just leave it as "source_id", and let the code fill that as needed, i.e. with the "s", "b", "v", etc. Conceptually, source id includes whatever random characters it needs to create a usable source identification.

About how to figure out what the source type: Is there something that can be done, as a separate Step maybe, to do the slit type identification? Or is this all twisted up in deep down processing?

In reality, IMHO, this should be solved much earlier upstream.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

Agree that a separate argument is not needed for the source type field (s, b, v), but can be added by calwebb_spec3 as part of the creation of the source_id field used in the product name.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

Putting this on hold for now, because it'll take other DMS components a little while to make necessary changes in their code to accommodate the new file name syntax, which they can't do in time for B9.1. Will also wait for results of INS discussion of the broader issue of whether data for multiple sources should be bundled into a single file.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Katie Kaleida on JIRA:

Howard Bushouse is this a candidate for DMS 10.1? 

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

Katie Kaleida  This has been on hold ever since the discussion was started amongst other subsystems regarding the potential "bundling" of data from all sources into a single product at level 3, hence eliminating the use of source_id values in the product names all together. If someone is willing to declare that discussion complete, with a decision to NOT rework the way store spectroscopic data in all level 3 products, then we can go ahead and restart the work to implement this. Although still not sure it would make it into B10.1 anymore at this point, but it's possible (and hence a candidate).

@stscijgbot-jp
Copy link
Collaborator Author

stscijgbot-jp commented Feb 19, 2024

Comment by Katie Kaleida on JIRA:

Just adding a note that Howard's comment above is still the latest status on this ticket.  We need guidance on how we will implement this ticket before we can determine which DMS build it can be worked in. 

Meeting minutes with relevant details can be found here:  https://innerspace.stsci.edu/pages/viewpage.action?spaceKey=jwstdms&title=2022-06-27+Discuss+DMS+usage+of+PPS+source+ids

@stscijgbot-jp
Copy link
Collaborator Author

Comment by James Muzerolle on JIRA:

The consensus of the NIRSpec team is to go ahead with the source ID handling approach. We strongly recommend against bundling the level 3 data as that would result in very large files with many hundreds if not thousands of extensions, which goes against our original "source-based" concept and would significantly impact an efficient workflow for users.

@stscijgbot-jp
Copy link
Collaborator Author

stscijgbot-jp commented Apr 16, 2024

Comment by Howard Bushouse on JIRA:

Tracking info for Cal software updates that are needed:

calwebb_spec3.py has code to a) check for source ID's that are negative and change them to unused positive values, and b) check for source ID's that are > 99,999 and change them to unused values between 1 and 99,999, in order to fit the 5-digit ID limit. Most, if not all, of this code should be removable or at least replaced with logic that fits the new scheme.

jwst/associations/lib/rules_level3_base contains the "format_product" function that's used to create the proper file name for all source-based products. It is currently hardwired to always use the letter "s" as the source_id prefix, followed by a 5-digit prefix. This will need updating to fit the new scheme.

/jwst/assign_wcs/nirspec.py function "get_open_msa_slits" has logic in it to detect dedicated background slitlets based on the fact that all shutters contained in the slitlet have "PRIMARY_SOURCE" set to "N" (i.e. no shutters containing the primary source). When these are found, the slitlet is assigned an unused positive source_id value and a source_name of "background_{slitlet_id}." Any slitlets found with a slitlet_id of -1 are currently ignored (skipped). All of this logic will need updating to accommodate the new background and virtual slits scheme.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

Found some examples of virtual slits defined in jw01181009001_01_msa.fits. Here are example MSA file entries:
||slitlet_id||msa_id||quad||row||col||source_id||bkg||state||xpos||ypos||dither||primary||
|95|1|3|148|31|-1|Y|OPEN|nan|nan|1|N|
|95|1|3|148|31|-1|N|OPEN|0.495|0.516|2|Y|
|95|1|3|148|31|-1|Y|OPEN|nan|nan|3|N|
|95|1|3|148|32|-1|N|OPEN|0.5|0.5|1|Y|
|95|1|3|148|32|-1|Y|OPEN|nan|nan|2|N|
|95|1|3|148|32|-1|Y|OPEN|nan|nan|3|N|
|95|1|3|148|33|-1|Y|OPEN|nan|nan|1|N|
|95|1|3|148|33|-1|Y|OPEN|nan|nan|2|Y|
|95|1|3|148|33|-1|N|OPEN|0.506|0.484|3|N|
| | | | | | | | | | | | |
|95|141|3|148|31|-2|Y|OPEN|nan|nan|1|N|
|95|141|3|148|31|-2|N|OPEN|0.495|0.516|2|Y|
|95|141|3|148|31|-2|Y|OPEN|nan|nan|3|N|
|95|141|3|148|32|-2|N|OPEN|0.5|0.5|1|Y|
|95|141|3|148|32|-2|Y|OPEN|nan|nan|2|N|
|95|141|3|148|32|-2|Y|OPEN|nan|nan|3|N|
|95|141|3|148|33|-2|Y|OPEN|nan|nan|1|N|
|95|141|3|148|33|-2|Y|OPEN|nan|nan|2|Y|
|95|141|3|148|33|-2|N|OPEN|0.506|0.484|3|N|

So virtual slits all have negative source_id values and they are sequential, starting with -1 and increasing downwards to more negative values. Other than that, they have all the same characteristics of a regular slit that contains a point source, in terms of having 1 shutter assigned as containing the primary source, with numerical values for the estimated x/y position in the shutter (as opposed to NaN), and background values for each shutter set as would be expected for a slit containing a point source in only 1 shutter.

So all of the shutter characteristics are consistent with a virtual POINT source, as opposed to EXTENDED. Given that there's no corresponding entry in the source table from which we can retrieve a STELLARITY value, the remaining question is whether virtual slits should be treated by default as if they contain a POINT source or an EXTENDED source.

James Muzerolle Diane Karakla Can you confirm whether we should be treating virtual slits as POINT or EXTENDED?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by James Muzerolle on JIRA:

Howard Bushouse after conferring with Diane, we think the default should be EXTENDED, as that's likely the most common use case for virtual slits/sources (e.g., observations with the long slit configurations, or master background slitlets that weren't marked properly in MPT).

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

James Muzerolle Thanks. That's easy enough to implement. The virtual slits are recognized by the fact that they have source_id < 0 in the MSA metadata file, regardless of any other characteristics. We then simply need to set the default STELLARITY value for the virtual slits to 0.0, which will cause them to be treated as EXTENDED sources downstream (which is exactly the same as what we do for BACKGROUND slits).

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

James Muzerolle Diane Karakla Nadia Dencheva and any other interested NIRSpec stakeholders:

I've got code modifications for this working in #8442 but have a couple of questions about implementation details for which I'd like opinions. It focuses on what source_id value to assign to background and virtual slits for carrying along internally in the datamodels from level-2b to level-3 processing. As a reminder, the entries in the MSA metadata file are such that all background-only slits have source_id=0 and all virtual slits have a negative, but unique, source_id value (e.g. -1, -2, -3, -4, ...).

First, regarding the background slits, we can't carry along a source_id of zero for all of the background slits all the way to the end of level-3 processing, because the final results get saved to different files and the file names have the source_id in them, so we don't want to end up with multiple files all having "b000000000" in their source_id portion of their file names. We have to have a unique 9-digit number for each of them. My solution so far has been to assign the slitlet_id value from each background slit to their source_id value, thus replacing the source_id=0 from the MSA meta file and giving us unique level-3 file names. I figure that using the slitlet_id value at least gives users a value that can be traced back to the MSA meta file to figure out which background slit the data came from (as opposed to say randomly assigning source_id values to background slits starting with 1 and counting upwards, i.e. b000000001, b000000002, b000000003, etc.). Does that sound reasonable to you?

Second, regarding the virtual slits that have source_id -1, -2, -3, ... in the MSA meta file, we could potentially carry along these negative values in the meta data in the datamodels during level2b processing (i.e. their SOURCEID keyword values would retain these negative numbers), in order to preserve traceability back to the original entries in the MSA meta file. Of course those negative values would have to be changed when creating the file names for level-3 outputs, such that source_id portion of their file names would be "v000000001", "v000000002", etc. Users would need to realize that the numerical value used here in the file name is the absolute value of the original source_id listed in the MSA meta file. Alternatively, we could take the absolute value of the virtual source_id's already when the data are loaded from the MSA meta file during level-2b processing, in which case their SOURCEID keyword values would be positive already in the "cal" files created by level-2b processing and would carry over into the level 3 products.

So, to summarize here are examples of the two different scenarios in which we change source_id values for background and virtual slits as soon as the MSA meta data are loaded by assign_wcs in level-2b processing or modified later only when it comes time to create level 3 output file names.

Scenario 1: Source slits have all of their original MSA source_id, SOURCEID, SLITID, SRCNAME, and SRCALIAS value, as determined from the MSA meta file. Background slits have SOURCEID copies from SLITID and made up SRCNAME/SRCALIAS values that use SLITID/SOURCEID. Virtual slits have their original SLITID from the MSA meta file, but SOURCEID is abs(MSA source_id) and made up SRCNAME/SRCALIAS values that use SOURCEID.
|| ||Source slit||Background slit||Virtual slit||
|MSA source_id|11631|0|-71|
|SOURCEID keyword|11631|87|71|
|SLITID keyword|83|87|29|
|SRCNAME keyword|1345_11631|background_87|virtual_71|
|SRCALIAS keyword|11631|bkg_87|vrt_71|
|file name sourceid field|s000011631|b000000087|v000000071|

Scenario 2: Everything almost the same as scenario 1, except that the original MSA source_id values are retained in their SOURCEID keyword values. This means multiple background slits would all have SOURCEID=0, but would have unique SLITID values.
|| ||Source slit||Background slit||Virtual slit||
|MSA source_id|11631|0|-71|
|SOURCEID keyword|11631|0|-71|
|SLITID keyword|83|87|29|
|SRCNAME keyword|1345_11631|background_87|virtual_71|
|SRCALIAS keyword|11631|bkg_87|vrt_71|
|file name sourceid field|s000011631|b000000087|v000000071|

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

P.S. The exp_to_source step that's used at the beginning of the calwebb_spec3 pipeline has also been modified (in the draft PR) to do its sorting based on SRCNAME, instead of SOURCEID, because some of the background and virtual slit SOURCEID's could have the same value as regular slits that contain a real source. So the new sorting works with either of the 2 scenarios presented above, because all of the SRCNAME values are unique across all slits.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Diane Karakla on JIRA:

Howard Bushouse James Muzerolle I like your solution for the background slit naming where SOURCEID keyword inherits the SLITID keyword value, to prevent too many files with b000000000 names.  But I am concerned about the solution for the virtual slits where the SOURCEID becomes the absolute value of the MSA_source_id value.  I'm worried that there could already be source IDs with the same value.  Wouldn't that confuse things potentially?  Or am I missing something.

 

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

Diane Karakla No, you're not missing anything. There could end up being a "real" slit (i.e. with a source) and a virtual slit with the same SOURCEID value (e.g. real source of 42 and virtual source of abs(-42)). They would still have unique SLITID values, but would have the same SOURCEID value. So that is an argument for maintaining the negative source_id value for virtual slits (except of course in the final file names, where it'd be "s000000042" vs. "v000000042").

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

The code I have implemented for now uses the slitlet_id value to assign to source_id for background slits, and maintains the negative source_id value (in the meta data) for virtual slits, so that they don't get confused with regular source slits having the same (positive) source_id value. Only when the level-3 output file name for virtual slits is constructed is the evidence of the negative value removed. But they will still have the "v" prefix on the source_id in the file name, to keep it unique from a regular "s" slit with the same positive source_id value in its file name.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

Diane Karakla I've noticed in the MSA metadata file that we have for program 1345, which includes many background and virtual slits, that the source "alias" created for virtual sources appears to be an attempt to create a name based on the source RA+Dec. The formatting of the alias name though appears to have issues. Here, for example, are a few rows of the MSA source table for several virtual sources showing (left to right) program number, source_id, source_name, source alias, RA, Dec, preimage_id, and stellarity:


          (1345,    -7, '1345_-7', '-09:-039:0.0000 +53:04:3.28', 215.23415775, 53.06757769, 'None', 0.),

          (1345,    -6, '1345_-6', '-09:-039:0.0000 +53:04:7.02', 215.21956386, 53.06861725, 'None', 0.),

          (1345,    -5, '1345_-5', '-09:-039:0.0000 +53:04:7.78', 215.23885807, 53.06882654, 'None', 0.),

          (1345,    -4, '1345_-4', '-09:-039:0.0000 +53:04:10.45', 215.24506469, 53.06956846, 'None', 0.),

          (1345,    -3, '1345_-3', '-09:-039:0.0000 +53:04:13.51', 215.17723798, 53.07041817, 'None', 0.),

          (1345,    -2, '1345_-2', '-09:-039:0.0000 +53:04:18.93', 215.21460355, 53.0719244 , 'None', 0.),

          (1345,    -1, '1345_-1', '-09:-039:0.0000 +53:04:22.82', 215.15541968, 53.07300639, 'None', 0.),

          (1345,   719, '1345_719', '719', 215.1600523 , 53.0645219 , 'None', 1.),

          (1345,  1023, '1345_1023', '1023', 215.1884129 , 53.0336473 , 'None', 1.),

          (1345,  1029, '1345_1029', '1029', 215.2187624 , 53.0698619 , 'None', 1.),

Notice how all of the virtual sources, identified by negative values in the 2nd (source_id) column, have odd looking aliases in column 4, like "-09:-039:0.0000 +53:04:22.82". The Declination values in the 2nd half of the alias string seem to be OK, but the RA values are all messed up. First, there's no such thing as negative RA values, and even if there were you wouldn't put the negative sign on both the hour and minute value. Second, they all seem to have the same (bad) RA value, while the Dec values are unique for each source. As far as I know these values are just read directly from some PPS DB table, hence I assume the values are bad already in that db table.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Diane Karakla on JIRA:

Geez,  Haha. Thanks for pointing it out.  I can file an APT ticket to fix it, or do you want to?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Howard Bushouse on JIRA:

Diane Karakla I'll let you handle the APT ticket.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Diane Karakla on JIRA:

I filed https://jira.stsci.edu/browse/APT-93956 for the alias name fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant