New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement IANA time zone ID normalization #4548
Conversation
left: "America/Argentina/Comodrivadavia" right: "America/Argentina/ComodRivadavia"
Blocked on #4562 |
} | ||
} | ||
|
||
/// Converts a string to [ypotryll case](https://stackoverflow.com/a/54524664/1407170). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just call this iana_titlecase
, instead of introducing a completely unknown name.
You should also define how this behaves on non-ASCII inputs. What's the expected output for Europe/Düsseldorf
? It will currently become Europe/DüSseldorf
, but it should either become Europe/Dsseldorf
, Europe/D??sseldorf
, or be an error.
pub fn normalize_iana<'a>(&'a self, iana_id: &'a str) -> Cow<'a, str> { | ||
if let Some(bcp47_id) = self.iana_to_bcp47(iana_id) { | ||
if let Some(iana_roundtrip) = self.bcp47_to_iana(bcp47_id) { | ||
if iana_id.eq_ignore_ascii_case(iana_roundtrip) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why explicitly avoid the canonicalisation? If you really just want normalization, why do the lookup in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please sync
icu_timezone_data::impl_time_zone_iana_to_bcp47_v1!(Baked); | ||
icu_timezone_data::impl_time_zone_iana_to_bcp47_v2!(Baked); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the v2 struct supports all v1 functionality, we should not include baked data for both
#[cfg_attr(feature = "serde", derive(serde::Deserialize))] | ||
pub struct IanaToBcp47MapV2<'data> { | ||
/// A map from IANA time zone identifiers to indexes of BCP-47 time zone identifiers. | ||
/// The lowest bit is 1 if the name is canonical and 0 if it is not canonical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this means. Please phrase this in a way that does not require knowing ZeroTrie internals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in #4718
pub map: ZeroAsciiIgnoreCaseTrie<ZeroVec<'data, u8>>, | ||
/// A sorted list of BCP-47 time zone identifiers. | ||
#[cfg_attr(feature = "serde", serde(borrow))] | ||
// Note: this is 9739B as ZeroVec<TinyStr8> and 9335B as VarZeroVec<str> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And how much as ZeroVec<'data, TimeZoneBcp47Id>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in #4718
Putting this back in draft and moving the data part to #4718 so I can rework the API a bit more. |
Fixes #4031 Replaces #4548 This new all-in-one type supports all of the time zone ID operations we need, with some operations being asymptotically faster than others. Not sure if we want to deprecate the old ones or keep them around, at least IanaBcp47RoundTripMapper. It's not completely obsolete since it has different performance characteristics. --------- Co-authored-by: Robert Bastian <4706271+robertbastian@users.noreply.github.com>
Replaced by #4774 |
Fixes #4031