-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-40563: Persist ObservationIdentifiers #33
Conversation
73753dc
to
e584e5b
Compare
visit=visit, | ||
detector=link_row["detector"], | ||
day_obs=visit_dict[visit].day_obs, | ||
physical_filter=visit_dict[visit].physical_filter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have gone a bit too far in creating and using the VisitRecord
dataclass to avoid any positional argument. visit_dict
its scope only in this function and is not persisted. So I'd be okay dropping this.
) | ||
cell_recarray = np.rec.fromrecords( | ||
recList=cell_records, | ||
formats=None, # formats has specified to please mypy. See numpy#26376. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someday when my fix upstream makes its way to our rubin-env
we can remove it specifying formats=None
. Today is not that day.
@@ -85,8 +85,7 @@ def __init__( | |||
self._common = common | |||
# Remove any duplicate elements in the input, sorted them and make | |||
# them an immutable sequence. | |||
# TODO: Remove the conditioning in DM-40563. | |||
self._inputs = tuple(sorted(set(inputs))) if inputs else () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm intentionally dropping support for files generated with previous versions (since we are still in version 0.x) since I feel that it's easier to drop support for no inputs rather than maintaining this until we hit 1.0
.
python/lsst/cell_coadds/_fits.py
Outdated
@@ -241,12 +259,47 @@ def readAsMultipleCellCoadd(self) -> MultipleCellCoadd: | |||
outer_cell_size = Extent2I(header["OCELL1"], header["OCELL2"]) | |||
psf_image_size = Extent2I(header["PSFSIZE1"], header["PSFSIZE2"]) | |||
|
|||
# Attempt to get inputs for each cell. | |||
inputs = GridContainer[list[ObservationIdentifiers]](shape=grid.shape) | |||
if written_version >= "0.3": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of lexicographic comparison is not going to work if you ever hit 0.10
. It might be better to switch all the version variables to tuple[int, int]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or use packaging.version
which is designed for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to packaging.version
. I had wanted to do this when having a patch number was under consideration but had dropped the idea since then.
python/lsst/cell_coadds/_fits.py
Outdated
visit = link_row["visit"] | ||
obs_id = ObservationIdentifiers( | ||
instrument=header["INSTRUME"], | ||
packed=link_row["packed"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably out of scope for this ticket, but I would like to drop "packed" from both the in-memory and on-disk forms. I think it'll end up more of a maintenance headache than a convenience.
assert len(instrument_set) == 1, "All cells must have the same instrument." | ||
instrument = instrument_set.pop() | ||
|
||
visit_recarray = np.rec.fromrecords( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several years ago, I remember learning that numpy record arrays had some surprising performance overheads compared to using a regular numpy.ndarray
with a structured dtype, and I've avoided them ever since, since the difference is just whether you use attribute access vs. item access (i.e. row["field_name"]
). I have no idea if that's still the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't care as much about the attribute access vs. item access, but avoiding positional index access and also having to create several fits.Column
objects. I'm inclined to leave this as is, and change it in the future if there is a noticeable overhead. Fortunately, this is purely internal and can be changed without affecting FILE_FORMAT_VERSION
at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, it sounds like astropy does something special with numpy.recarray
, which is not something I realized.
f9a0244
to
ea41368
Compare
when inputs is None.
ea41368
to
eb20205
Compare
Checklist
doc/changes
python/lsst/cell_coadds/_fits.py
was modified)