Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: deprecate iteration on bare Associative, introduce PairIterator. #25013

Closed
wants to merge 1 commit into from

Conversation

andyferris
Copy link
Member

This is WIP towards modifying the iteration scheme for Associative. There have been some discussions about whether dictionaries should iterate key-value pairs, or just plain values (like arrays). For my part, I feel the latter makes most sense for operations like map (preserves keys), reduce (such as sum(dict)), and so-on, where you are using the dictionary as an indexed collection of data. Dictionary iteration has been discussed in multiple places - using values was floated by me at #24019, and iteration of dictionary-types is treated pretty extensively at #22194 (named tuples, which iterate values). There, Jeff's opening post contains this observation:

There are different perspectives on dict-like types: sometimes they are more like relations, containing pairs, sometimes they are primarily collections of keys, and sometimes they are primarily collections of values. Iteration cannot do all of these at once; indeed, iteration has little to do with the concept of "keys".

With that in mind, in this PR the iteration protocol on Associative has been deprecated. A new PairIterator is introduced, and created by pairs(dict). Users should use pairs(dict) or values(dict) (or keys(dict)) to explicitly specify iteration results, like so:

for k,v in pairs(dict)
    ...
end

for v in values(dict)
    ...
end

for k in keys(dict)
    ...
end

The in predicate also requires specifying pairs, values or keys. This PR doesn't yet fully deal with filter and map on dictionaries (and pairs/values/keys thereof), but arguably users will want to choose to map or filter with or without knowing about the key. At some point, it's probably also worth discussing how the deque protocol (push!, etc) interacts with iteration and the iterator types.

This PR also leaves open the possibility of reintroducing an iteration protocol for dictionaries at a later date. Comments very welcome. I will also note that v1.0 is coming up quickly, and we can iterate on this rapidly if we do decide to merge.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 10, 2017

Can we please not break iteration and associative arrays just before the release.

If you want to experiment with this, you're welcome to make a package and show that your new Abstract type behavior is better. Otherwise, penalizing everyone and forcing them to be explicit in their code for it to do the right thing (and work in generic code contexts like convert and construct) sounds very unfriendly to users.

@ararslan ararslan added the domain:collections Data structures holding multiple items, e.g. sets label Dec 10, 2017
@andyferris andyferris force-pushed the ajf/pairs2 branch 2 times, most recently from 813a374 to 91bbfac Compare December 11, 2017 13:53
Users should use `pairs(dict)` or `values(dict)` (or `keys(dict)`) to
disambiguate iteration result.
@ararslan ararslan added the status:triage This should be discussed on a triage call label Dec 13, 2017
@StefanKarpinski
Copy link
Sponsor Member

I'm really intrigued by the indexing unification ideas that you've been working through, @andyferris, but I'm afraid that we just don't have time to let them come to full fruition in Base Julia. I think we can continue to work through them and I can imagine a "Ferrisverse" of packages where we maybe monkey patch Base indexing behavior to have this kind of composability, but I just don't think we should undertake this so close to the 1.0 feature freeze.

@StefanKarpinski StefanKarpinski removed the status:triage This should be discussed on a triage call label Dec 14, 2017
@andyferris
Copy link
Member Author

@StefanKarpinski Understood!

(I do tend to agree that this is underbaked at this stage).

@andyferris andyferris closed this Dec 14, 2017
@yurivish
Copy link
Contributor

yurivish commented Dec 17, 2017

FWIW, I'd be in favor of the deprecation – I just had a case where I spent some time debugging a for loop after switching from Dicts to NamedTuples, expecting them both to iterate the same way.

I really would not mind deprecating bare iteration on associatives and being clear about what it is I want to iterate in the short term in exchange for a powerful and unified treatment of iteration and maps over collections in 1.x, rather than 2.0.

@StefanKarpinski
Copy link
Sponsor Member

Looking at this again, I take it back since it seems that all this does is require using pairs to iterate dictionaries, which I am in favor of at this point. For some reason, I thought this was trying to do more than that.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 18, 2017

Why? That sounds like a really terrible user experience.

@oscardssmith
Copy link
Member

As I understand it is that different people want dict iteration to be over pairs, values, or keys. By forcing people to specify, it makes it clear what the intent was, and prevents unexpected bugs when the actual does not match what people expected.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 18, 2017

prevents unexpected bugs when the actual does not match what people expected

Julia usually tries not to define APIs in this way. This is somewhat reminiscent of the decision in #23233 not to make you specify expm instead of exp when you want to call exp on a matrix.

@StefanKarpinski
Copy link
Sponsor Member

Why? That sounds like a really terrible user experience.

Yes, you've made it clear that this is your opinion. However, we're already getting reports (e.g. from @yurivish) that the difference in how named tuples and dicts iterate has caused confusion and bugs. Forcing people to be explicit about this is a better user experience than that.

@andyferris
Copy link
Member Author

andyferris commented Dec 18, 2017

To me, the point of this deprecation is to consider in the future having dictionaries behave as NamedTuples do (iterate values). I do realize this is contentious (it seems clearly beneficial to me but I am only one user).

Doing this would be breaking, so deprecation is necessary. The advantage of v0.7/v1.0 is that iteration won't have to stay deprecated for long, whereas if we decide to do this in the future I'm guessing we'll need an entire 2.x release where this is deprecated. (There's also the possibility of deprecating now and backtracking in v1.1 if we decide iterating anything other than pairs by default is crazy).

The work to make the value iteration scheme sensible everywhere would require a bit more thought - for instance, we currently push! pairs onto the dictionary, construct it from lists of pairs, etc. It's likely we'll need to tweak the API in several places before introducing value iteration (this might involve backtracking on using Pairs so much in the API, considering whether we really want to push! onto things that don't pop! in the same order, etc). It's likely the quickest way of figuring out all these tweaks is to do as Jameson suggested, and make a package that prototypes this.

And yes, there's also the problem that the status quo is confusing.

@Sacha0 Sacha0 added the status:triage This should be discussed on a triage call label Dec 18, 2017
@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 18, 2017

that the difference in how named tuples and dicts iterate has caused confusion and bugs

And I was very strongly against this, partly for this reason. I was told repeated that they shouldn't behave the same because a NT is nothing like a Dict. Why is this suddenly a surprise now?

Forcing people to be explicit about this is a better user experience than that.

Why is having having two types, one that can do everything the old type used to be able to do (PairIterator) and one that can't (Dict) a better user experience than continuing to use one type (Dict) that does everything?

@JeffBezanson
Copy link
Sponsor Member

To me, the point of this would not be to make dicts and named tuples behave the same. The argument is just that while iterating pairs is useful in some contexts, in other contexts (e.g. maximum) it's not helpful.

@JeffBezanson
Copy link
Sponsor Member

The other way to look at this is synchronizing the APIs of "indexed collections", i.e. anything that maps indices to values. However I believe that is somewhat blocked on an inherent ambiguity in arrays, where they are sometimes considered to be a simple "series" or "list" of things, and other times considered to be mappings of indices to values.

@StefanKarpinski StefanKarpinski removed the status:triage This should be discussed on a triage call label Jan 4, 2018
@yurivish
Copy link
Contributor

yurivish commented Jan 7, 2018

What were the results of the latest round of triaging?

I believe a change has been merged to master that turns keyword arguments back into something that is not a NamedTuple, which was recently discussed by @c42f and @andyferris in #gripes on Slack.

@JeffBezanson
Copy link
Sponsor Member

This wasn't discussed in the last triage call (or two?).

@yurivish
Copy link
Contributor

yurivish commented Jan 7, 2018

Oh, sorry. I assumed something was talked about that caused the "triage" label to be removed two days ago.

@andyferris
Copy link
Member Author

Hmm... so it remains a bit ambiguous to me what the triage outcome was here (or if there was a triage outcome / discussion). If there's some appetite for this I can try finish it off - I mostly bring this up because this is one of those things that may be a bit painful to change for v2.0+

@StefanKarpinski StefanKarpinski added the status:triage This should be discussed on a triage call label Jan 18, 2018
@JeffBezanson JeffBezanson removed the status:triage This should be discussed on a triage call label Feb 1, 2018
@StefanKarpinski StefanKarpinski removed the status:triage This should be discussed on a triage call label Feb 1, 2018
@andyferris
Copy link
Member Author

Here's an odd thought somewhat related to what motivated me with this (being able to do stuff with the values of dictionaries):

  • map tends to follow the iteration protocol
  • broadcast tends to deal with indices and their values

Would it be ridiculous to suggest that broadcast(f, dict) preserves the indices of the dict and maps the values of the dict? (While map works on the pairs)? (See the last comment in #24019 (comment) for one motivation)

@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 5, 2018

We already have map(f, values(dict)) / broadcast(f, values(dict)), which could plausibly preserve indices while calling f on the the values (although I didn't test what they do now – I suspect it's not implemented, since I don't think we currently have getindex for the ValueIterator type implemented). It seems pointless to me to have broadcast(f, pairs(dict)) mean the same thing (since pairs(dict) === dict). But maybe related, there's debate in #18618 about whether we should make broadcast more consistent in this regard to work with any iterable, or continue to only work with types which define a custom broadcast API.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 5, 2018

Just as a followup note to this PR, we now have this type in base, with a slightly shortened name: Iterators.Pairs (née IndexValue)

With it, I think we've actually nearly arrived at a unification of the dual nature of Arrays and Dictionaries (via values / pairs), but without needing to cause breakage. If you swap around a few of the names, I think you can get back to essentially this PR: Associative / Dict / Pairs -> PairsIterator / Pairs, ValueIterator -> Dict / Associative / ValueIterator.

@andyferris
Copy link
Member Author

As a another followup note to this PR, I thought it might be relevant to reference Dictionaries.jl which was recently released, and implements a dictionary interface that iterates values (and supports broadcasting, map, reduce, filter etc of values).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants