Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement IANA time zone ID normalization #4548

Closed
wants to merge 15 commits into from
Closed

Conversation

sffc
Copy link
Member

@sffc sffc commented Jan 24, 2024

Fixes #4031

@sffc
Copy link
Member Author

sffc commented Feb 29, 2024

Blocked on #4562

@sffc
Copy link
Member Author

sffc commented Mar 5, 2024

Blocked on #4562

Not really blocked on this, but we should resolve #4562 before shipping 1.5

@robertbastian robertbastian marked this pull request as ready for review March 5, 2024 15:28
}
}

/// Converts a string to [ypotryll case](https://stackoverflow.com/a/54524664/1407170).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just call this iana_titlecase, instead of introducing a completely unknown name.

You should also define how this behaves on non-ASCII inputs. What's the expected output for Europe/Düsseldorf? It will currently become Europe/DüSseldorf, but it should either become Europe/Dsseldorf, Europe/D??sseldorf, or be an error.

pub fn normalize_iana<'a>(&'a self, iana_id: &'a str) -> Cow<'a, str> {
if let Some(bcp47_id) = self.iana_to_bcp47(iana_id) {
if let Some(iana_roundtrip) = self.bcp47_to_iana(bcp47_id) {
if iana_id.eq_ignore_ascii_case(iana_roundtrip) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why explicitly avoid the canonicalisation? If you really just want normalization, why do the lookup in the first place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please sync

Comment on lines 44 to +45
icu_timezone_data::impl_time_zone_iana_to_bcp47_v1!(Baked);
icu_timezone_data::impl_time_zone_iana_to_bcp47_v2!(Baked);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the v2 struct supports all v1 functionality, we should not include baked data for both

#[cfg_attr(feature = "serde", derive(serde::Deserialize))]
pub struct IanaToBcp47MapV2<'data> {
/// A map from IANA time zone identifiers to indexes of BCP-47 time zone identifiers.
/// The lowest bit is 1 if the name is canonical and 0 if it is not canonical.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this means. Please phrase this in a way that does not require knowing ZeroTrie internals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #4718

pub map: ZeroAsciiIgnoreCaseTrie<ZeroVec<'data, u8>>,
/// A sorted list of BCP-47 time zone identifiers.
#[cfg_attr(feature = "serde", serde(borrow))]
// Note: this is 9739B as ZeroVec<TinyStr8> and 9335B as VarZeroVec<str>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And how much as ZeroVec<'data, TimeZoneBcp47Id>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #4718

@sffc
Copy link
Member Author

sffc commented Mar 21, 2024

Putting this back in draft and moving the data part to #4718 so I can rework the API a bit more.

@sffc sffc marked this pull request as draft March 21, 2024 10:44
sffc added a commit that referenced this pull request Mar 23, 2024
@Manishearth Manishearth removed the request for review from a team April 4, 2024 16:08
@Manishearth Manishearth removed their request for review April 4, 2024 16:08
robertbastian added a commit that referenced this pull request May 23, 2024
Fixes #4031
Replaces #4548

This new all-in-one type supports all of the time zone ID operations we
need, with some operations being asymptotically faster than others.

Not sure if we want to deprecate the old ones or keep them around, at
least IanaBcp47RoundTripMapper. It's not completely obsolete since it
has different performance characteristics.

---------

Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>
@sffc
Copy link
Member Author

sffc commented May 23, 2024

Replaced by #4774

@sffc sffc closed this May 23, 2024
@sffc sffc deleted the iana_canon branch May 23, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support retrieval of normalize IANA names, including aliases
2 participants