-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IntervalList
cautious insert
#943
base: master
Are you sure you want to change the base?
Conversation
@lfrank @khl02007 @samuelbray32 I just wanted to collect input on this change as it could be a big one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple high-level comments:
- I think another enhancement that fits with this PR would be a
IntervalList.cleanup(restriction)
function.- Purpose would be to check if interval(s) matching the restriction are currently referenced anywhere in the database and either:
- delete them if not
- return a list of referenced locations if they are
- Could potentially overwrite the Mixin class's
delete
function to run this cleanup call ifinterval_list_name
is in the table's keys. This would help solve issues of orphaned entries
- Purpose would be to check if interval(s) matching the restriction are currently referenced anywhere in the database and either:
) | ||
|
||
if not exists: | ||
self.insert1(key, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this do a check if the exact key is already present whether the interval lists match? Would resolve some of the comments I have in LFP and LFPBand below
lfp_band_valid_times, new_timestamps | ||
) | ||
# check that the valid times are the same | ||
assert np.isclose( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cautious insert could still create errors without doing this explicit check.
- Say that this interval list name has been previously inserted.
- the band was deleted the interval list stays
- parameters were changed in the band that results in a slightly different interval list
- cautious_insert finds no matches on times, tries to insert with this name
- We would want an error in the case that it's different
}, | ||
replace=True, | ||
) | ||
elif not np.allclose(tmp_valid_times[0], lfp_valid_times): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar concern about still needing to check for matching as in comment for LFPBand
replace=True, | ||
) | ||
approx_name=True, | ||
) # removed replace=True 2024-04-22 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could make redoing this step tricky for users. They would have to know they need to manually delete the interval list when deleting here
Co-authored-by: Samuel Bray <sam.bray@ucsf.edu>
The data collection I did for #778 suggested there were relatively few orphans to worry about. I've added cleanup/nightly_cleanup functions to mirror those on the nwbfile tables |
Thanks for your review, @samuelbray32! It sounds like Under the current model of 1 IntervalList entry per downstream entry, we assume the key we're using is unique and not used elsewhere when we replace. I'm concerned about the possible edge case in which pipeline B replaces the times for pipeline A's data, violating data integrity, so I would want to avoid |
Agreed, replace seems exceedingly dangerous under this model. I'm thinking of the case where the user (or a table) tries to insert an entry that matches an existing primary key but differs in the actual interval values. |
Description
We have previously discussed redundancy in
IntervalList
in #778. This PR makes pervasive edits to the codebase to check existing entries before inserting new ones. This comes with the drawback of future mismatches between substrings ofinterval_list_name
and other features of a given key. I would argue that substrings should not be used in this manner, and the confusion presented will be minor, as the other contents of the key takes precedence. The use ofnwb_file_name
inIntervalList
prevents cross-session reuse of keys, so confusion should be minimized in that respect.I've marked as draft pending the addition of documentation on this change. I welcome feedback on the structure of
cautious_insert
and its inclusion across the codebaseChecklist:
CITATION.cff
alter
snippet for release notes.CHANGELOG.md
with PR number and description.