Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support retrieval of normalize IANA names, including aliases #4031

Closed
sffc opened this issue Sep 13, 2023 · 2 comments · Fixed by #4774
Closed

Support retrieval of normalize IANA names, including aliases #4031

sffc opened this issue Sep 13, 2023 · 2 comments · Fixed by #4774
Assignees
Labels
C-time-zone Component: Time Zones S-small Size: One afternoon (small bug fix or enhancement) U-temporal User: Temporal

Comments

@sffc
Copy link
Member

sffc commented Sep 13, 2023

#4024 allows for accessing the canonical IANA identifier for a given time zone. However, it does not allow case normalization of non-canonical time zones. For example, Temporal requires that "asia/calcutta" be echoed back to the user as "Asia/Calcutta".

To implement this, add another data key that contains the normalized strings for all IANA names that are non-canonical. Sort them by their lowercase value, then look them up with a binary search.

Expose this as an API such as

pub struct AllInOneIanaBcp47Thingy {
    // should contain all 3 data keys
}

impl AllInOneIanaBcp47ThingyBorrowed {
    /// Returns the BCP-47 ID and the normalized IANA name of the input
    pub fn lookup_iana(&self, iana: &str) -> Option<(TimeZoneBcp47Id, String)> { ... }
}
@sffc sffc added U-ecma402 User: ECMA-402 compatibility C-time-zone Component: Time Zones U-temporal User: Temporal S-small Size: One afternoon (small bug fix or enhancement) and removed U-ecma402 User: ECMA-402 compatibility labels Sep 13, 2023
@sffc sffc self-assigned this Sep 13, 2023
@robertbastian
Copy link
Member

Can we use the AsciiTrie itself to store the canonical identifiers? IIUC it currently stores lowercase identifiers, and we convert requests to lowercase before doing a lookup. Instead, it could store the canonical identifiers and provide a case-insensitive lookup function. To do this, at every node it checks the input character first, and tries the other case if there's no match.

@robertbastian
Copy link
Member

Discussion:

  • @sffc I have considered that, but you'd need to backtrack if you take the wrong branch and it gets complicated
  • @robertbastian I'd just disallow two different casings of the same character at the same level
  • @sffc there might be collisions, let me check the data... I think it could work and shouldn't break in the future because new zones are all titlecased
  • @sffc it does not solve the normalization problem, and we need to allocate a string during lookup instead of returning a slice

Conclusion:
@sffc to think about this a little bit more tonight, maybe he can make this work, otherwise go ahead as originally proposed

@sffc sffc added this to the Priority Backlog ⟨P3⟩ milestone Sep 21, 2023
robertbastian added a commit that referenced this issue May 23, 2024
Fixes #4031
Replaces #4548

This new all-in-one type supports all of the time zone ID operations we
need, with some operations being asymptotically faster than others.

Not sure if we want to deprecate the old ones or keep them around, at
least IanaBcp47RoundTripMapper. It's not completely obsolete since it
has different performance characteristics.

---------

Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-time-zone Component: Time Zones S-small Size: One afternoon (small bug fix or enhancement) U-temporal User: Temporal
Projects
Status: Done
2 participants