FIX: Make lists and dicts hashable #684

oesteban · 2020-11-25T20:54:07Z

Resolves: #683.

bids/utils.py

tyarkoni

LGTM, thanks. I'm wondering if we should kill off the 'dir' and 'id' return types in the next major (breaking) release... they seem to cause a lot of problems, and the efficiency gains are probably not meaningful compared to just looping over objects. But we can take that up elsewhere.

oesteban · 2021-01-12T21:56:13Z

Rebased on top of #695 - please merge that one first.

tyarkoni · 2021-03-03T15:30:55Z

@effigies @oesteban what did we decide to do about this? I'm okay merging it, but have a vague recollection there might have been some concern (possibly on my side, but if so, I can no longer reconstruct it).

effigies · 2021-03-03T15:42:59Z

I think #694 was the discussion that postponed this decision. I've also forgotten some context here...

tyarkoni · 2021-03-03T15:52:00Z

Oh, right, thanks. I'll try to work up the proposal I suggested there.

adelavega · 2021-07-02T21:43:23Z

@oesteban, @effigies and I chatted and decided that the frozendicts may cause problems for users, since its unfamiliar.
Implementing flattening dicts is probably a better way to resolve this, and enable querying for nested dicts. We can put it on the roadmap.

Do you agree? If so I think we can close this PR.

oesteban · 2021-07-03T07:04:10Z

First things first, and as I told @tyarkoni on one of the related issues, I'm likely not going to maintain this. So you shouldn't take it if you think this will be too onerous to maintain (although, honestly, I don't see a big burden - but I'm not to decide).

That said, I think the flattening solution only addresses the inner part of the problem and conflates two different indefinitions of BIDS: i) the one about whether one can define dicts as metadata values; and ii) whether it is allowed to name a metadata piece with the same name of an entity. Both problems arise in pybids under this unhashable type (ii only if there are dicts defined, but there's the question of whether pybids returns the adequate values when it doesn't crash in this case).

Flattening dictionaries would address i) and incidentally solve ii) partially. But it is just a data representation problem of which the user doesn't need to know. This means, are we going to "unflatten" when returning these values? Because I bet users will be more comfortable receiving a dict RatherThanSomeCamelCaseFlatteningOfSomething (which we don't know yet the whole domain of possible values nested and number of nestings, btw). Nipype's results file had to deal with a similar situation, and the resulting code is a bit brittle and hard to maintain. I wonder if alquemy would take frozen dicts right away.

Then there is the question of having to return unique values of a given entity. I honestly think that returning something that looks like a dict, behaves like a dict and (if we cast the dict back before returning) is sometimes an actual dict is more convenient/intuitive to the user than the flattened values. This also ensures the sanity of the values in all get_(metadata)s. Say we have sessions with the entity key overloaded in some TSV+JSON files. In such a situation, we may get:

>>> get_sessions()
['01', '02', {'Time': '00:00', 'Duration': 30}]

In this case, to give the user something useful, we don't even need to apply the entity's regexp to return the right thing. It would be more difficult when the "session" in the TSV+JSON file is some string value:

>>> get_sessions()
['01', '02', 'comes-from-metadata']

But none of the other solutions would solve this problem, as it stems directly from the indefinition of BIDS.

I admit though that this solution does nothing to fix the problem of inserting dicts in the db. Unless, inserting frozen dicts would be an acceptable option, which would doubly justify this PR.

And this is my 2ct.

adelavega · 2021-07-03T19:07:52Z

Oscar, thanks for the spirited defense.

To be clear, my suggestion to solve (ii) with respect to "shorthand" functions for BIDS entity is primarily dealt by #749. I think this PR (or flattening dicts) takes care of (i) (how to represent hashable types).

> >>> get_sessions()
> ['01', '02', {'Time': '00:00', 'Duration': 30}]

For this example, I prefer the results of #749:

>>> get_sessions()
['01', '02']

For shorthand functions (i.e. get_(entity)s, where entity is a proper BIDS entity), get would only look at values coming from filenames. I believe according to BIDS, entities are by definition only defined in filenames, not meta-data sidecars.

Arguably, get_sessions should return all values it finds, even the entity-like value session defined in meta-data. But I think it's fair to just say that shorthand functions for Entities, should only look at values defined in filenames. I don't see a scenario where getting {'Time': '00:00', 'Duration': 30} as a result is useful, and it could cause problems for automated tools that rely on pybids for ingestion. I think it's fair for pybids to decide this is how get works, even if the spec has some indefinition.

Also, since filtering happens only on the get function, this value ``{'Time': '00:00', 'Duration': 30}` is associated with the corresponding file, if needed.

If we agree on #749, then the question is how to represent dicts for actual meta-data (to avoid crashing on shorthand functions such as get_SliceTiming). My main bias for flattening was simplicity but I'm open to your solution. I agree for arbitrarily nested dicts, frozendicts may be a nicer result.

Three questions remain for me about frozendicts:

Do they cause any problems in the db?
Do they represent lists? get_SliceTiming crashes now bc results are lists. Flattening dicts would not fix this.
What if I wanted to query a nested element?
For example: layout.get("PipelineDescription.Version"="20.0.5")

If 1) and 2) are not problems with frozendicts then I think #749 + this PR are compatible and would solve our problems.

oesteban · 2022-04-14T10:12:06Z

I'm trying to wrap my head around rebasing this, @effigies . I thought it would be more straightforward, but the new "id" and "dir" filtering is not so clear to me yet.

Resolves: bids-standard#683.

oesteban · 2022-04-14T12:36:49Z

I think this is it. I have removed one level of branching which I hope doesn't break tests. There is one new unit test that was made the code fail with IntendedFor, but it'd be nice to also add another test with the B0FieldIdentifier and B0FieldSources (#834) to make sure it doesn't break.

cc/ @effigies

codecov · 2022-04-14T12:38:55Z

Codecov Report

Merging #684 (9a887bf) into master (5ea3103) will increase coverage by 0.03%.
The diff coverage is 78.12%.

@@            Coverage Diff             @@
##           master     #684      +/-   ##
==========================================
+ Coverage   86.03%   86.07%   +0.03%     
==========================================
  Files          35       35              
  Lines        3924     3934      +10     
  Branches     1017     1023       +6     
==========================================
+ Hits         3376     3386      +10     
- Misses        355      356       +1     
+ Partials      193      192       -1

Impacted Files	Coverage Δ
bids/layout/layout.py	`88.49% <72.72%> (+0.20%)`	⬆️
bids/utils.py	`86.66% <90.00%> (+0.35%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ea3103...9a887bf. Read the comment docs.

oesteban commented Nov 25, 2020

View reviewed changes

bids/utils.py Outdated Show resolved Hide resolved

tyarkoni approved these changes Jan 11, 2021

View reviewed changes

oesteban force-pushed the fix/issue683-more-hashable branch from 9fb388b to 0b7dcfc Compare January 12, 2021 20:30

oesteban mentioned this pull request Jan 12, 2021

Q: Flatten metadata keys? #694

Open

oesteban force-pushed the fix/issue683-more-hashable branch from 6455905 to c48076f Compare January 12, 2021 21:48

oesteban mentioned this pull request Jun 30, 2021

FIX: get unique, with conflicting meta-data #748

Merged

effigies mentioned this pull request Apr 12, 2022

get_B0FieldIdentifiers errors #834

Open

FIX: Make lists and dicts hashable

80e9d2e

Resolves: bids-standard#683.

oesteban force-pushed the fix/issue683-more-hashable branch from c48076f to 80e9d2e Compare April 14, 2022 12:33

fix: broken tests passing 'obj' as return_type

9a887bf

bpinsard mentioned this pull request Nov 10, 2022

IntendedFor is ignored by wrangler.find_estimators nipreps/sdcflows#266

Open

adelavega mentioned this pull request Feb 16, 2023

FIX: Exclude non-Hashables from set in .get(return_type='id') #948

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Make lists and dicts hashable #684

FIX: Make lists and dicts hashable #684

oesteban commented Nov 25, 2020

tyarkoni left a comment •

edited

oesteban commented Jan 12, 2021

tyarkoni commented Mar 3, 2021

effigies commented Mar 3, 2021

tyarkoni commented Mar 3, 2021

adelavega commented Jul 2, 2021

oesteban commented Jul 3, 2021

adelavega commented Jul 3, 2021 •

edited

oesteban commented Apr 14, 2022

oesteban commented Apr 14, 2022

codecov bot commented Apr 14, 2022 •

edited

FIX: Make lists and dicts hashable #684

Are you sure you want to change the base?

FIX: Make lists and dicts hashable #684

Conversation

oesteban commented Nov 25, 2020

tyarkoni left a comment • edited

Choose a reason for hiding this comment

oesteban commented Jan 12, 2021

tyarkoni commented Mar 3, 2021

effigies commented Mar 3, 2021

tyarkoni commented Mar 3, 2021

adelavega commented Jul 2, 2021

oesteban commented Jul 3, 2021

adelavega commented Jul 3, 2021 • edited

oesteban commented Apr 14, 2022

oesteban commented Apr 14, 2022

codecov bot commented Apr 14, 2022 • edited

Codecov Report

tyarkoni left a comment •

edited

adelavega commented Jul 3, 2021 •

edited

codecov bot commented Apr 14, 2022 •

edited