`IntervalList` cautious insert #943

CBroz1 · 2024-04-23T21:26:06Z

Description

We have previously discussed redundancy in IntervalList in #778. This PR makes pervasive edits to the codebase to check existing entries before inserting new ones. This comes with the drawback of future mismatches between substrings of interval_list_name and other features of a given key. I would argue that substrings should not be used in this manner, and the confusion presented will be minor, as the other contents of the key takes precedence. The use of nwb_file_name in IntervalList prevents cross-session reuse of keys, so confusion should be minimized in that respect.

I've marked as draft pending the addition of documentation on this change. I welcome feedback on the structure of cautious_insert and its inclusion across the codebase

Checklist:

This PR should be accompanied by a release: (yes/no/unsure)
If release, I have updated the CITATION.cff
This PR makes edits to table definitions: (yes/no)
If table edits, I have included an alter snippet for release notes.
I have updated the CHANGELOG.md with PR number and description.
I have added/edited docs/notebooks to reflect the changes

…to ilc

edeno · 2024-04-24T21:37:13Z

@lfrank @khl02007 @samuelbray32 I just wanted to collect input on this change as it could be a big one.

samuelbray32

A couple high-level comments:

I think another enhancement that fits with this PR would be a IntervalList.cleanup(restriction) function.
- Purpose would be to check if interval(s) matching the restriction are currently referenced anywhere in the database and either:
  - delete them if not
  - return a list of referenced locations if they are
- Could potentially overwrite the Mixin class's delete function to run this cleanup call if interval_list_name is in the table's keys. This would help solve issues of orphaned entries

src/spyglass/common/common_interval.py

src/spyglass/lfp/analysis/v1/lfp_band.py

samuelbray32 · 2024-05-02T19:17:34Z

src/spyglass/common/common_interval.py

+        )
+
+        if not exists:
+            self.insert1(key, *args, **kwargs)


should this do a check if the exact key is already present whether the interval lists match? Would resolve some of the comments I have in LFP and LFPBand below

samuelbray32 · 2024-05-02T19:18:45Z

src/spyglass/lfp/analysis/v1/lfp_band.py

-                lfp_band_valid_times, new_timestamps
-            )
-            # check that the valid times are the same
-            assert np.isclose(


The cautious insert could still create errors without doing this explicit check.

Say that this interval list name has been previously inserted.

the band was deleted the interval list stays

parameters were changed in the band that results in a slightly different interval list

cautious_insert finds no matches on times, tries to insert with this name

We would want an error in the case that it's different

samuelbray32 · 2024-05-02T19:21:21Z

src/spyglass/lfp/v1/lfp.py

-                },
-                replace=True,
-            )
-        elif not np.allclose(tmp_valid_times[0], lfp_valid_times):


Similar concern about still needing to check for matching as in comment for LFPBand

samuelbray32 · 2024-05-02T19:24:30Z

src/spyglass/spikesorting/v0/spikesorting_recording.py

-            replace=True,
-        )
+            approx_name=True,
+        )  # removed replace=True 2024-04-22


Could make redoing this step tricky for users. They would have to know they need to manually delete the interval list when deleting here

…to ilc

Co-authored-by: Samuel Bray <sam.bray@ucsf.edu>

CBroz1 · 2024-05-06T21:04:43Z

- I think another enhancement that fits with this PR would be a `IntervalList.cleanup(restriction)` function.

The data collection I did for #778 suggested there were relatively few orphans to worry about. I've added cleanup/nightly_cleanup functions to mirror those on the nwbfile tables

CBroz1 · 2024-05-06T21:30:37Z

Thanks for your review, @samuelbray32! It sounds like replace=True is currently serving the purpose of freeing the user from the need to delete an existing orphan during a re-insert. I currently only check valid_times and, if another key exists, run the fk-ref to that key. By doing a deeper check of the key against existing entries, we could infer the user is in this replace case and return the existing key. Is that right?

Under the current model of 1 IntervalList entry per downstream entry, we assume the key we're using is unique and not used elsewhere when we replace. I'm concerned about the possible edge case in which pipeline B replaces the times for pipeline A's data, violating data integrity, so I would want to avoid replace altogether. My new feature of 'reuse existing entries' makes the data integrity piece all the more important

samuelbray32 · 2024-05-09T20:28:40Z

I'm concerned about the possible edge case in which pipeline B replaces the times for pipeline A's data, violating data integrity, so I would want to avoid replace altogether.

Agreed, replace seems exceedingly dangerous under this model. I'm thinking of the case where the user (or a table) tries to insert an entry that matches an existing primary key but differs in the actual interval values. replace would have handled this before, but maybe there should be some analagous flag for the insert call that doesn't replace the existing entry in this partially-matching case but adds some hash to the key you're trying to enter and inserts that instead

CBroz1 added 3 commits April 23, 2024 16:20

WIP: Cautious Insert on Inte

9dd23e5

WIP: fix failing tests

69f7866

Merge branch 'master' of https://github.com/LorenFrankLab/spyglass in…

9f364a8

…to ilc

edeno requested review from khl02007, lfrank, edeno and samuelbray32 April 24, 2024 21:34

samuelbray32 requested changes May 2, 2024

View reviewed changes

CBroz1 and others added 3 commits May 6, 2024 14:09

Merge branch 'master' of https://github.com/LorenFrankLab/spyglass in…

a3762a8

…to ilc

Add IntervalList.cleanup

3f21dd6

Apply suggestions from code review

cb4417e

Co-authored-by: Samuel Bray <sam.bray@ucsf.edu>

CBroz1 added 2 commits May 6, 2024 16:12

Revert LFPBand censor logic

625e948

Revert formatting

0c282eb

Check exact

71b4620

CBroz1 mentioned this pull request May 7, 2024

IntervalList is hard to use and highly redundant in contents #778

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`IntervalList` cautious insert #943

`IntervalList` cautious insert #943

CBroz1 commented Apr 23, 2024 •

edited

edeno commented Apr 24, 2024

samuelbray32 left a comment

samuelbray32 May 2, 2024

samuelbray32 May 2, 2024

samuelbray32 May 2, 2024

samuelbray32 May 2, 2024

CBroz1 commented May 6, 2024 •

edited

CBroz1 commented May 6, 2024

samuelbray32 commented May 9, 2024

IntervalList cautious insert #943

Are you sure you want to change the base?

IntervalList cautious insert #943

Conversation

CBroz1 commented Apr 23, 2024 • edited

Description

Checklist:

edeno commented Apr 24, 2024

samuelbray32 left a comment

Choose a reason for hiding this comment

samuelbray32 May 2, 2024

Choose a reason for hiding this comment

samuelbray32 May 2, 2024

Choose a reason for hiding this comment

samuelbray32 May 2, 2024

Choose a reason for hiding this comment

samuelbray32 May 2, 2024

Choose a reason for hiding this comment

CBroz1 commented May 6, 2024 • edited

CBroz1 commented May 6, 2024

samuelbray32 commented May 9, 2024

`IntervalList` cautious insert #943

`IntervalList` cautious insert #943

CBroz1 commented Apr 23, 2024 •

edited

CBroz1 commented May 6, 2024 •

edited