Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary protocol and conversions between dictionary types #293

Open
JaapWijnen opened this issue Jun 5, 2023 · 5 comments
Open

Dictionary protocol and conversions between dictionary types #293

JaapWijnen opened this issue Jun 5, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@JaapWijnen
Copy link

JaapWijnen commented Jun 5, 2023

Hey everyone! Two ideas for Dictionary types:

  • Could we add a DictionaryProtocol containing the shared api between all dictionary types? Especially with the new SortedDictionary there's multiple Dictionary types and this is becoming more relevant. That way we can make functions generic over the type of dictionary that is used.
  • Another painpoint I've experienced is converting between Dictionary types. With the proposed protocol this is not needed as much since we can make functions generic over the dictionary type but it would be nice to have a conversion from an OrderedDictionary to a regular Dictionary for example (the reverse I'm not sure how that would be defined/if it's useful) Similar to the initializers on all the FloatingPoint types.

Let me know your thoughts! Would this be useful to you? Potential problems you envision or reasons to not do this in general? Would love to hear it, thanks!

These might also be interesting uses for the different types of Set that exist

@JaapWijnen JaapWijnen added the enhancement New feature or request label Jun 5, 2023
@lorentey
Copy link
Member

Yes! Now that we have Dictionary, OrderedDictionary, TreeDictionary (and soon SortedDictionary, which isn't quite ready for use yet), I think we have enough examples of dictionary types to start forming the protocol.

The protocol should ultimately live in the Standard Library, but we could prototype it within this package. (Transitioning it into the stdlib later is going to be painful, though.)

The tests include a couple of protocols to test that core Dictionary API is implemented consistently (DictionaryAPIChecker and DictionaryAPIExtras), but those are just for ensuring API parity, and they aren't suitable for use as DictionaryProtocol.

Design decisions:

  • Would DictionaryProtocol refine Sequence, or would it leave that up to each use case? (FWIW, DictionaryProtocol cannot refine Collection because OrderedDictionary isn't one.)
  • Do we want to model the Keys and Values views as protocol requirements?
  • Should the protocol model a mutable Dictionary, or would it be better to allow read-only types to conform to it? (Do we want to spin off the mutations into a separate MutableDictionaryProtocol?) I think most API that would take a DictionaryProtocol would only need quick lookups, and it would be desirable to allow passing read-only dictionaries to them. (Such as a constant dictionary that implements perfect hashing, constructed at build time.)
  • If mutations will be part of the protocol, do we want it to include any semantic guarantees on index invalidation? (TreeDictionary invalidates all indices on every mutation, including assignments via the Values view. If we don't change this, then that sort of settles the issue -- the protocol will need to follow suit.)

Random notes:

  • For conversions between dictionaries, I think it would be reasonable for (the mutable variant of) the protocol to include a requirement for init(_ other: some DictionaryProtocol<Key, Value>).
  • mapValues/compactMapValues are interesting. They ought to be protocol requirements, because customizing their implementation enables significant improvements (by avoiding re-hashing or re-sorting keys, for example), but their shape isn't easily expressible in a protocol. Perhaps we could shoehorn them in as extension methods forwarding to a new init<T>(mappingValuesOf dict: some DictionaryProtocol<Key, T>, by transform: (T) -> Value throws) rethrows requirement, which individual types can manually specialize for typeof(dict) == Self.self.)

@JaapWijnen
Copy link
Author

JaapWijnen commented Jun 17, 2023

Hey! Thanks for the thorough response.

The protocol should ultimately live in the Standard Library, but we could prototype it within this package. (Transitioning it into the stdlib later is going to be painful, though.)

What direct difficulties do you expect? I do see the general challenge of adding a protocol to such a fundamental type in the standard library. And perhaps there not being direct other types that would profit from this protocol in the stdlib.

Would DictionaryProtocol refine Sequence, or would it leave that up to each use case? (FWIW, DictionaryProtocol cannot refine Collection because OrderedDictionary isn't one.)

I would definitely say DictionaryProtocol would refine Sequence. Since we're not really interested in the way the elements are stored at the time of using a protocol, one of the big uses would be to be able to loop over the elements of the Dictionary I would think. Are there any drawbacks to refining Sequence? Just to double check by refining you mean the following?

protocol DictionaryProtocol: Sequence {}

Do we want to model the Keys and Values views as protocol requirements?

I would think so. I'm not too familiar with the other newer implementations but this does seem an interesting addition to the protocol. If we had a SetProtocol we could restrict our keys associated type to that too perhaps?

Should the protocol model a mutable Dictionary, or would it be better to allow read-only types to conform to it? (Do we want to spin off the mutations into a separate MutableDictionaryProtocol?) I think most API that would take a DictionaryProtocol would only need quick lookups, and it would be desirable to allow passing read-only dictionaries to them. (Such as a constant dictionary that implements perfect hashing, constructed at build time.)

This is interesting, it might be worthwhile keeping them seperate for a first pass and find out if there's clear benefits? They're not directly obvious to me. But in a way we would be mirroring Collection and MutableCollection which does make sense to me in terms of structuring. I'd say we there should be a clear reason to split them otherwise we're just adding complexity perhaps.

If mutations will be part of the protocol, do we want it to include any semantic guarantees on index invalidation? (TreeDictionary invalidates all indices on every mutation, including assignments via the Values view. If we don't change this, then that sort of settles the issue -- the protocol will need to follow suit.)

No direct ideas/opinions on this, but that is a good point to keep in mind.

For conversions between dictionaries, I think it would be reasonable for (the mutable variant of) the protocol to include a requirement for init(_ other: some DictionaryProtocol<Key, Value>).

Yeah especially if we would have something like OrderedDictionary's elements view in the protocol that should be easy to have a default implementation for perhaps?

mapValues/compactMapValues are interesting. They ought to be protocol requirements, because customizing their implementation enables significant improvements (by avoiding re-hashing or re-sorting keys, for example), but their shape isn't easily expressible in a protocol. Perhaps we could shoehorn them in as extension methods forwarding to a new init(mappingValuesOf dict: some DictionaryProtocol<Key, T>, by transform: (T) -> Value throws) rethrows requirement, which individual types can manually specialize for typeof(dict) == Self.self.)

Hm not sure about this I'd have to take another look!

What are your thoughts for something similar for Set?
What are the current plans for this package? I see a lot of changes since the last tag! Any plans for a next version or is there a clear roadmap somewhere?
How would you like to proceed? I wouldn't mind to get a first (probably basic) version up in an MR and we can discuss further?

@JaapWijnen
Copy link
Author

JaapWijnen commented Jul 20, 2023

Sorry taking a little longer than expected since I'm a little swamped with work :) I have a basic thing that might make sense to share soon so we have something tangible to discuss and improve on!
I'm running into one inconsistency unfortunately which is that sometimes stdlib's dictionary uses a labeled or unlabelled tuple for its element in api's.
Example:
The standard library's Dictionary type defines an type alias Element = (key: Key, value: Value)
But one initialiser for example has the following signature:

@inlinable
  public init<S: Sequence>(
    uniqueKeysWithValues keysAndValues: __owned S
  ) where S.Element == (Key, Value) {

whereas OrderedDictionary has:

@inlinable
  public init<S: Sequence>(
    uniqueKeysWithValues keysAndValues: S
  ) where S.Element == (key: Key, value: Value) { // or Element since it has a typealias

What are your thoughts on this @lorentey? I'd argue that the OrderedCollections implementation is more correct. But I don't see that std lib api changing. I don't imagine this would be accepted as a breaking change for Swift 6 for example.

Edit:
Only later noticed that OrderedDictionary also contains a similar initialiser that takes an unlabelled tuple as Element. Maybe there's a difference between these initialisers that I don't fully comprehend?

@JaapWijnen
Copy link
Author

Hey @lorentey ! Just wanted to check if you have any thoughts on the current PR I submitted. Would love to move this forward :)

@JaapWijnen
Copy link
Author

Put some more thought into this and took a look at SortedDictionary. Unfortunately none of the initialisers there match with the Dictionary and OrderedDictionary ones. Making it hard to get an initialiser defined in the protocol itself. That in turn would make incorporating a mapping between dictionary types in the protocol hard to do too. Have not looked at the TreeDictionary type yet since unifying a protocol over SortedDictionary, OrderedDictionary and plain Dictionary is already proving quite hard.

SortedDictionary has:

  • .init(keysWithValues:)
  • .init(sortedKeysWithValues:)

OrderedDictionary has:

  • .init(uniqueKeysWithValues:)
  • .init(uniqueKeys:values:)
  • .init(_ keysAndValues: uniquingKeysWith:)

Which so we can't have a general initialiser in the protocol. Any thoughts on this?
One thought I had was that SortedDictionary in a way could also adopt the.init(uniqueKeysWithValues:) initialiser since technically it's keys are all unique they're just not required to have a unique hash like Dictionary or OrderedDictionary
If you agree that works then .init(_ keysAndValues: uniquingKeysWith:) can also be included where the decision of wether to overwrite the existing value (as is done in the SortedDictionary.init(keysWithValues:)) or not would instead be handled by the uniquingKeysWith: (Value, Value) throws -> Value closure. This could be an interesting initialiser anyway when feeding an unordered sequence into SortedDictionary.init(keysWithValues:) since there's no guarantee for which key value will end up in the SortedDictionary for duplicate keys in the sequence.

Would love to hear your thoughts so I can move further on the implementation of this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants