Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft plan to align canonical time zone IDs across implementations #806

Open
justingrant opened this issue Jul 13, 2023 · 20 comments
Open
Labels
c: datetime Component: dates, times, timezones s: blocked Status: the issue is blocked on upstream
Milestone

Comments

@justingrant
Copy link
Contributor

justingrant commented Jul 13, 2023

This issue proposes some ideas and a draft plan for how implementations can align on a common set of canonical time zone IDs, in order to fix problems like:

  • Variation in canonicalization behavior between Chrome/Safari vs Firefox
  • Chrome and Safari's returning outdated IDs like Asia/Calcutta and Europe/Kiev, causing many user complaints.

There are three questions to answer:

  1. Which time zone IDs should be primary?
  2. How should we get these IDs into implementations?
  3. When should we ship these changes?

This is an early draft, so please let me know if I made mistakes below or if you see a better way to achieve the goal of using up-to-date canonical IDs in ECMAScript. Note that this plan below is complimentary but unrelated to the now-Stage 3 proposal-caonical-tz proposal.

ECMA-262 currently uses the terms "primary time zone identifier" and "non-primary time zone identifier" instead of "canonical" and "non-canonical". I'm mostly using the newer terms in this issue, but for clarity I use "canonical" when referring to ICU's output, because that's what ICU calls it.

Feedback is welcome, especially from @sffc @FrankYFTang @anba @Constellation @gibson042 @dminor.

1. Which time zone IDs should be primary?

To avoid messy geopolitical judgement calls, I recommend that we defer to the IANA Time Zone Database to decide which IDs should be canonical, using the following simple rules:

  • a) Every ID in zone.tab should be a primary time zone identifier in ECMAScript. Because zone.tab includes at least one unique time zone for each ISO 3166-1 country code, if all zone.tab IDs are canonical then time zone changes in a country will not affect any other country.
  • b) If CLDR considers an ID in zone.tab to be non-canonical, then the zone.tab ID should primary in ECMAScript, and CLDR's outdated canonical ID should be a non-primary time zone identifier that resolves to the zone.tab ID. This will fix cases where ICU currently returns an outdated ID like Asia/Calcutta and Europe/Kiev.

Chrome and Safari, which returns ICU's canonical IDs as-is, currently have 19 IDs that use outdated ICU canonical identifiers. Firefox, which overrides ICU's canonicalization, currently has 11 non-primary IDs that resolve to another country's primary ID, like Europe/Bratislava resolving to Europe/Prague. This proposal would change those engines' behavior to follow the rules above.

In actual implementation pseudocode, what I'm proposing is this:

// Build a map between ICU and zone.tab for all IDs where ICU is using an outdated name
const outdatedIdMap = new Map();
for (id of idsFromZoneTabFile) {
  const icuCanonical = getCanonicalIdAccordingToICU(id);
  if (icuCanonical !== id) outdatedIdMap.set(icuCanonical, id);
}

function getPrimaryId(id) {
  const candidate = getCanonicalIdAccordingToICU(id);
  const outdatedIdFixup = outdatedIdMap.get(candidate);
  return outdatedIdFixup ?? candidate;
}

The JSON objects below ere generated by a simple JS app using code that's similar to the pseudocode above. You can run and edit it at https://codesandbox.io/s/zone-tab-mismatches-mlf93j.

For Chrome and Safari, the object below lists IDs from zone.tab where ICU uses an outdated ID. The key is the ICU ID and the value is what should be primary.

{
  "Africa/Asmera": "Africa/Asmara",
  "America/Buenos_Aires": "America/Argentina/Buenos_Aires",
  "America/Catamarca": "America/Argentina/Catamarca",
  "America/Cordoba": "America/Argentina/Cordoba",
  "America/Jujuy": "America/Argentina/Jujuy",
  "America/Mendoza": "America/Argentina/Mendoza",
  "America/Coral_Harbour": "America/Atikokan",
  "America/Indianapolis": "America/Indiana/Indianapolis",
  "America/Louisville": "America/Kentucky/Louisville",
  "America/Godthab": "America/Nuuk",
  "Asia/Saigon": "Asia/Ho_Chi_Minh",
  "Asia/Katmandu": "Asia/Kathmandu",
  "Asia/Calcutta": "Asia/Kolkata",
  "Asia/Rangoon": "Asia/Yangon",
  "Atlantic/Faeroe": "Atlantic/Faroe",
  "Europe/Kiev": "Europe/Kyiv",
  "Pacific/Truk": "Pacific/Chuuk",
  "Pacific/Enderbury": "Pacific/Kanton",
  "Pacific/Ponape": "Pacific/Pohnpei"
}

For Firefox, the keys of the object below are zone.tab IDs that are not canonical in Firefox. Unlike Chrome/Safari discussed above, Firefox's overrides ICU's canonicalization using the TZDB backward file.

These overrides solve the outdated IDs problem that Chrome and Safari have, but they introduce a new problem: some IDs merge multiple ISO 3166-1 country codes. For example, Slovakia's time zone resolves in Firefox to Europe/Prague in the Czech Republic, but Europe/Bratislava should also be primary. Using zone.tab instead of backward to power the overrides should fix this problem.

{
  "America/Kralendijk": "America/Curacao",
  "America/Lower_Princes": "America/Curacao",
  "America/Marigot": "America/Port_of_Spain",
  "America/St_Barthelemy": "America/Port_of_Spain",
  "Arctic/Longyearbyen": "Europe/Oslo",
  "Europe/Bratislava": "Europe/Prague",
  "Europe/Busingen": "Europe/Zurich",
  "Europe/Mariehamn": "Europe/Helsinki",
  "Europe/Podgorica": "Europe/Belgrade",
  "Europe/San_Marino": "Europe/Rome",
  "Europe/Vatican": "Europe/Rome"
}

2. How should these canonicalization changes get into implementations?

@sffc and others recommend that CLDR and ICU be the right long-term home for all time zone info, including canonicalization. Although CLDR is currently designing a solution to expose IANA canonical IDs, it's unlikely a solution in CLDR and ICU will ship until 2024 at the earliest.

For V8 and JSC, there are only 19 outdated names, and new renames are very rare: only 4 in the last 8 years. Should we hard-code these 19 mappings until CLDR and ICU delivers the long-term solution? If not, is there another way to speed up these changes?

For Firefox, the change would be to use zone.tab instead of backward.

3. When should we ship these changes?

Here's a few options for when to ship these changes. Which do you prefer?

  • a) ASAP, don't wait for Temporal
  • b) At the same time as Temporal.TimeZone ships. It'll include proposal-canonical-tz to stop canonicalizing user-inputted IDs, so therefore less userland code should be affected by the primary ID changes.
  • c) Wait until Temporal is in wider usage. Not sure why we'd want to do this, but including it for discussion.
  • d) Never, it will be too disruptive for existing apps

My preference would be for (b), because it seems less risk of breaking the web than (a). But I could also be convinced that (a) is OK, especially if we're able to run tests beforehand on a small % of users before it's rolled out to everyone. Do browsers have a way to do tests like that?

I'd support (d) if we're able to verify through testing of real apps that these changes would be too disruptive.

Notes

We were originally hoping to tackle this plan as part of proposal-caonical-tz, but that proposal just reached Stage 3 so we're moving the IDs plan into ECMA-402 because the scope of the proposal is now locked down.

@justingrant justingrant added the c: datetime Component: dates, times, timezones label Jul 13, 2023
@justingrant justingrant added this to Priority Issues in ECMA-402 Meeting Topics Jul 13, 2023
@anba
Copy link
Contributor

anba commented Jul 14, 2023

For Firefox, we first need to know more about why 11 zones are merged into zones from another ISO-3166-1 country code, and in particular why these 11 were chosen. @anba, do you know? Once we have that understanding, we can figure out if it makes sense to un-merge these or leave them as-is.

Firefox time zone canonicalisation always returns an IANA tzdata Zone, potentially using a Zone entry from backzone, but never an IANA tzdata Link entry. And all eleven entries are IANA tzdata Links, so they get resolved to the corresponding Zone per CanonicalizeTimeZoneName. Basically we implement step 2 as if:

  1. If ianaTimeZone is a Link name, let ianaTimeZone be the String value of the corresponding Zone name as specified in the file backward of the IANA Time Zone Database.

(Mentioning only backward is a long known spec bug, see https://tc39.es/archives/bugzilla/1892/ and #272.)

The time zone canonicalisation overrides in Firefox don't take zone.tab into account, but instead only use backzone. This also leads to other differences like:

js> new Intl.DateTimeFormat("en", {timeZone: "Asia/Chungking"}).resolvedOptions().timeZone
"Asia/Chongqing"

whereas V8/JSC return "Asia/Shanghai". (This case is also mentioned in #272.)

But we only use backzone data when there's a corresponding Link outside of backzone. This restriction applies only to a single Zone, namely Asia/Hanoi:

js> new Intl.DateTimeFormat("en", {timeZone: "Asia/Hanoi"}).resolvedOptions().timeZone     
typein:1:1 RangeError: invalid time zone in DateTimeFormat(): Asia/Hanoi

@justingrant
Copy link
Contributor Author

justingrant commented Jul 14, 2023

@anba Looking at #272, it seems that the main goal of using backzone is to avoid geopolitically awkward Links like Europe/Oslo=>Europe/Berlin and/or Links that have a reasonably high likelihood to deviate in the future like Atlantic/Reykjavik=>Africa/Abidjan.

Is that correct? If so, then would using zone.tab instead of backzone achieve the same goals? (Where "using zone.tab" would mean that every ID in zone.tab would be a canonical ID.)

I'm asking because zone.tab seems to solve the same problem in the same way, but without putting different countries into the same canonical ID, e.g. Europe/Bratislava => Europe/Prague.

@justingrant
Copy link
Contributor Author

I updated the OP with anba's info, and added some pseudocode to clarify what "use zone.tab" would mean.

@anba
Copy link
Contributor

anba commented Jul 15, 2023

@anba Looking at #272, it seems that the main goal of using backward is to avoid geopolitically awkward Links like Europe/Oslo=>Europe/Berlin and/or Links that have a reasonably high likelihood to deviate in the future like Atlantic/Reykjavik=>Africa/Abidjan.

backzone, not backward. Can you update your updates here and the comment in tc39/proposal-canonical-tz#8 (comment) to mention backzone. Otherwise it's kind of confusing which file is meant.

And yes, the main reason for using backzone was to avoid geopolitically awkward Links.

Is that correct? If so, then would using zone.tab instead of backward achieve the same goals? Edit: "using zone.tab" would mean that every ID in zone.tab would be a canonical ID.

That means that for example America/Pangnirtung is no longer a canonical/primary ID, but instead is resolved to America/Iqaluit per the corresponding backward entry, right? IOW when using ICU it's necessary to not only handle the ICU time zones overrides from https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/tzcode/icuzones, but also the region overrides from https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/tzcode/icuregions.

It seems like sometimes it's still necessary to look into backzone: For example America/Virgin should be resolved to America/St_Thomas per backzone, because America/St_Thomas is in zone.tab. (Does this only apply to the TARGET1 cases in backward?)


Hmm, apropos ICU region overrides: We have to watch out how strictly we make zone.tab normative. For example TimeZonesOfLocale with strict reference to zone.tab places Europe/Simferopol into Russia, whereas the ICU region overrides put it into Ukraine:

@justingrant
Copy link
Contributor Author

justingrant commented Jul 15, 2023

backzone, not backward. Can you update your updates here and the comment in tc39/proposal-canonical-tz#8 (comment) to mention backzone. Otherwise it's kind of confusing which file is meant.

Oops, updated now.

That means that for example America/Pangnirtung is no longer a canonical/primary ID, but instead is resolved to America/Iqaluit per the corresponding backward entry, right?

Yes. And, using a more populous zone as another example, it'd mean that Europe/Bratislava would be a primary ID, instead of the current state in Firefox where Europe/Bratislava is a non-primary ID that resolves to Europe/Prague.

It seems like sometimes it's still necessary to look into backzone: For example America/Virgin should be resolved to America/St_Thomas per backzone, because America/St_Thomas is in zone.tab. (Does this only apply to the TARGET1 cases in backward?)

There are two questions here: how should we determine which IDs are primary, and what time zone rules should be used?

For the first question of which IDs are primary, AFAICT (although not 100% sure, so please let me know if this is a wrong assumption) that only backward is needed, using the following algorithm for resolving any ID to its primary ID:

  1. Assert: id is an ASCII-case-insensitive match for a Zone or Link name in the IANA Time Zone Database.
  2. Set id to the Zone or Link name in the IANA Time Zone Database that is an ASCII-case-insensitive match for id.
  3. While id is not present in the TZ column of zone.tab and is not listed as a Zone in the etcetera nor factory files of the IANA Time Zone Database , do
    a. Let target be the TARGET column value and let target1 be the TARGET1 column value from the line of backward of the IANA Time Zone Database where the LINK-NAME column value is id.
    b. If target1 is present, set id to target1.
    c. Else, set id to target.

Will this work? Or are there cases I'm not thinking of?

For the second question of which time zone rules to use, I think there are two options that we should use for all time zones:

  1. All time zones should use backzone rules. This would mean, for example, that Europe/Oslo would use different rules than Europe/Berlin.
  2. No time zones should use backzone rules. This would mean, for example, that Europe/Oslo would use the same rules as Europe/Berlin.

AFAIK, all major browsers seem to use option (2), so I'd be OK to leave this as-is unless it attracts a lot of user complaints.

IOW when using ICU it's necessary to not only handle the ICU time zones overrides from https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/tzcode/icuzones

I don't fully understand how those ICU overrides fit into my "use zone.tab" proposal, but from an initial look, it seems that these could be handled using the same algorithm noted above: if the IDs are in zone.tab then they're primary, otherwise follow Links until we get to a Zone.

but also the region overrides from https://github.com/unicode-org/icu/blob/main/icu4c/source/tools/tzcode/icuregions.

So far I've only been thinking of using zone.tab to determine which IDs are primary, not for region mapping where zone.tab is problematic because it can only associate a time zone to one country code. I assume we'd either want ICU overrides to handle this, or we'd need some other region=>Zone resolution mechanism.

Do you have an idea for how region=>Zone resolution could work if all zone.tab IDs are primary?

Hmm, apropos ICU region overrides: We have to watch out how strictly we make zone.tab normative. For example TimeZonesOfLocale with strict reference to zone.tab places Europe/Simferopol into Russia, whereas the ICU region overrides put it into Ukraine:

Agreed. Although the spec text you linked above doesn't seem to be very limiting, because it just says "of those in common use in region" without defining exactly what "in common use" means. If we can come up with a good algorithm for region=>Zone mapping, then should we make that spec text more explicit so that implementations will remain more consistent?

@anba
Copy link
Contributor

anba commented Jul 17, 2023

Yes. And, using a more populous zone as another example, it'd mean that Europe/Bratislava would be a primary ID, instead of the current state in Firefox where Europe/Bratislava is a non-primary ID that resolves to Europe/Prague.

I've picked America/Pangnirtung as the example, because America/Pangnirtung is currently treated as a Zone in all browsers. With the proposed changes America/Pangnirtung will be treated as a Link whose target is America/Iqaluit.

For the first question of which IDs are primary, AFAICT (although not 100% sure, so please let me know if this is a wrong assumption) that only backward is needed, using the following algorithm for resolving any ID to its primary ID:

There are two issues with the proposed algorithm:
(1) It doesn't work for the non-region Zones like PST8PDT (#778). This approach may work better:

  1. Assert: id is an ASCII-case-insensitive match for a Zone or Link name in the IANA Time Zone Database.
  2. Set id to the Zone or Link name in the IANA Time Zone Database that is an ASCII-case-insensitive match for id.
  3. Repeat, while id is a Link (ignoring backzone),
    a. If id is present in the TZ column of zone.tab, return id.
    b. Let target be the TARGET column value and let target1 be the TARGET1 column value from the line of backward of the IANA Time Zone Database where the LINK-NAME column value is id.
    c. If target1 is present, set id to target1.
    d. Else, set id to target.
  4. Return id.

Assuming IANA tzdata files are parsed in Vanguard format. When parsing in Rearguard format, the GMT Link in etcetera needs to be handled, too.

(2) Some Links won't get resolved to the expected Zone. For example Africa/Timbuktu will get resolved to Africa/Abidjan per backward, but Africa/Bamako (from zone.tab) is probably a better target.

Here's a detailed list of Link names, the proposed target when using only backward and zone.tab, the current target in Firefox, and the current target in Safari/Chrome. (The list was generated by using all Zones, including data from backzone, and then comparing for differences.)

Link Target Firefox Safari/Chrome
Africa/Timbuktu Africa/Abidjan Africa/Timbuktu Africa/Bamako
America/Argentina/ComodRivadavia America/Argentina/Catamarca America/Argentina/ComodRivadavia America/Catamarca
America/Coral_Harbour America/Panama America/Coral_Harbour America/Coral_Harbour
America/Ensenada America/Tijuana America/Ensenada America/Tijuana
America/Montreal America/Toronto America/Montreal America/Montreal
America/Nipigon America/Toronto America/Nipigon America/Nipigon
America/Pangnirtung America/Iqaluit America/Pangnirtung America/Pangnirtung
America/Rainy_River America/Winnipeg America/Rainy_River America/Rainy_River
America/Rosario America/Argentina/Cordoba America/Rosario America/Cordoba
America/Thunder_Bay America/Toronto America/Thunder_Bay America/Thunder_Bay
America/Yellowknife America/Edmonton America/Yellowknife America/Yellowknife
Asia/Chongqing Asia/Shanghai Asia/Chongqing Asia/Shanghai
Asia/Harbin Asia/Shanghai Asia/Harbin Asia/Shanghai
Asia/Kashgar Asia/Urumqi Asia/Kashgar Asia/Urumqi
Asia/Tel_Aviv Asia/Jerusalem Asia/Tel_Aviv Asia/Jerusalem
Atlantic/Jan_Mayen Europe/Berlin Atlantic/Jan_Mayen Arctic/Longyearbyen
Australia/Currie Australia/Hobart Australia/Currie Australia/Currie
Europe/Belfast Europe/London Europe/Belfast Europe/London
Europe/Tiraspol Europe/Chisinau Europe/Tiraspol Europe/Chisinau
Europe/Uzhgorod Europe/Kyiv Europe/Uzhgorod Europe/Uzhgorod
Europe/Zaporozhye Europe/Kyiv Europe/Zaporozhye Europe/Zaporozhye
Pacific/Enderbury Pacific/Kanton Pacific/Enderbury Pacific/Enderbury
Pacific/Johnston Pacific/Honolulu Pacific/Johnston Pacific/Johnston

In addition to the aforementioned Africa/Timbuktu, America/Coral_Harbour and Atlantic/Jan_Mayen are also resolved to unexpected Zones. All three cases can be fixed when using the #PACKRATLIST zone.tab data from backzone, though.

#PACKRATLIST zone.tab Link Africa/Bamako Africa/Timbuktu
#PACKRATLIST zone.tab Link America/Atikokan America/Coral_Harbour
#PACKRATLIST zone.tab Link Europe/Oslo Atlantic/Jan_Mayen

And a list of all Links, including backzone, which will be treated as Zones when using zone.tab. (Safari/Chrome already treat them as Zones.)

Link as Zone Firefox
America/Kralendijk America/Curacao
America/Lower_Princes America/Curacao
America/Marigot America/Port_of_Spain
America/St_Barthelemy America/Port_of_Spain
Arctic/Longyearbyen Europe/Oslo
Europe/Bratislava Europe/Prague
Europe/Busingen Europe/Zurich
Europe/Mariehamn Europe/Helsinki
Europe/Podgorica Europe/Belgrade
Europe/San_Marino Europe/Rome
Europe/Vatican Europe/Rome

And finally a list of Links, including backzone, the new proposed Target, and the current Target in Firefox resp. Safari/Chrome:

Link Target Firefox Safari/Chrome
Antarctica/South_Pole Pacific/Auckland Antarctica/McMurdo Pacific/Auckland
Asia/Chungking Asia/Shanghai Asia/Chongqing Asia/Shanghai
Pacific/Yap Pacific/Port_Moresby Pacific/Chuuk Pacific/Truk

The only problematic new proposed target is resolving Pacific/Yap to Pacific/Port_Moresby. Here it seems better to follow backzone and use Pacific/Chuuk. (Pacific/Chuuk is in zone.tab)

Do you have an idea for how region=>Zone resolution could work if all zone.tab IDs are primary?

No, I haven't yet looked into that.

@justingrant
Copy link
Contributor Author

Great conversation here, thanks. FYI, CLDR is proposing to add the IDs from zone.tab into CLDR data in cases where the CLDR canonical ID (the first ID in the list) is not the one listed in zone.tab. See unicode-org/cldr#3105. I spot-checked Yoshito's work in that PR and it looks like every problematic ID has an iana attribute, which is great. Also, intra-country Zones like America/Montreal have been deprecated in that PR, which I guess is OK?

Assuming that PR lands, do you think we should simplify the spec by simply referring to CLDR as ECMAScript's source of IDs (including which ones are primary vs. non-primary), instead of trying to define the algorithm for how we interpret the IANA time zone database? It seems like (with Yoshito's PR landed) CLDR may be closer in use cases and intent than TZDB which seems to be diverging quite a bit from what ECMAScript wants, at least in terms of supporting the at-least-one-zone-per-country model that ECMAScript prefers.

Do you know if there are any IDs are missing from CLDR data? (https://github.com/unicode-org/cldr/blob/main/common/bcp47/timezone.xml)

Assuming IANA tzdata files are parsed in Vanguard format. When parsing in Rearguard format, the GMT Link in etcetera needs to be handled, too.

GMT is already special-cased in the spec: https://tc39.es/ecma402/#sec-canonicalizetimezonename. Does that mean it won't matter if we use vanguard or rearguard?

Regardless, do you have a preference for whether ECMAScript implementations should use vanguard or rearguard?

All three cases can be fixed when using the #PACKRATLIST zone.tab data from backzone, though.

Weirdly, despite the text in the comment, in order to get the output of the TZDB makefile to include these three lines, you need to use make PACKRATLIST=zone.tab PACKRATDATA=backzone. Using make PACKRATLIST=zone.tab doesn't include these three lines.

The only problematic new proposed target is resolving Pacific/Yap to Pacific/Port_Moresby. Here it seems better to follow backzone and use Pacific/Chuuk. (Pacific/Chuuk is in zone.tab)

Agreed. In addition (much lower priority) I think that Antarctica/South_Pole should resolve to Antarctica/McMurdo not Pacific/Auckland. Note that the CLDR data in Yoshito's PR would enable this mapping too (in addition to Chuuk).

@justingrant
Copy link
Contributor Author

FYI, there's now a proposed ICU API that will expose the CLDR data linked above. See https://sourceforge.net/p/icu/mailman/message/37881038/ for API details.

@anba
Copy link
Contributor

anba commented Aug 25, 2023

Assuming IANA tzdata files are parsed in Vanguard format. When parsing in Rearguard format, the GMT Link in etcetera needs to be handled, too.

GMT is already special-cased in the spec: https://tc39.es/ecma402/#sec-canonicalizetimezonename. Does that mean it won't matter if we use vanguard or rearguard?

Ah, right. GMT will be handled through the UTC special case even when the etcetera link is ignored.

Regardless, do you have a preference for whether ECMAScript implementations should use vanguard or rearguard?

I don't think it matters right now, because vanguard or rearguard is mostly about supporting negative daylight saving time. And as long as we don't have a method which returns the difference from the current time zone offset to the standard time zone offset, cf. rawOffset and dstOffset out-params in icu::TimeZone::getOffset, it should be fine to use either format.

@mattjohnsonpint
Copy link

@justingrant - this is great. I'm hoping it will also fix Intl.supportedValuesOf('timeZone')? Is that on your radar?

See also: https://stackoverflow.com/questions/77214200/intl-supportedvaluesoftimezone-doesnt-provide-the-latest-timezone-informati

Thanks.

@FrankYFTang
Copy link
Contributor

New ICU API-

    /**
     * Returns the preferred time zone ID in the IANA time zone database for the given time zone ID.
     * There are two types of preferred IDs. The first type is the one defined in zone.tab file,
     * such as "America/Los_Angeles". The second types is the one defined for zones not associated
     * with a specific region, but not defined with "Link" syntax such as "Etc/GMT+10".
     *
     * <p>Note: For most of valid time zone IDs, this method returns an ID same as getCanonicalID().
     * getCanonicalID() is based on canonical time zone IDs defined in Unicode CLDR.
     * These canonical time zone IDs in CLDR were based on very old version of the time zone database.
     * In the IANA time zone database, some IDs were updated since then. This API returns a newer
     * time zone ID. For example, CLDR defines "Asia/Calcutta" as the canonical time zone ID. This
     * method returns "Asia/Kolkata" instead.
     * <p> "Etc/Unknown" is a special time zone ID defined by CLDR. There are no corresponding zones
     * in the IANA time zone database. Therefore, this API returns U_ILLEGAL_ARGUMENT_ERROR when the
     * input ID is "Etc/Unknown".
     *
     * @param id        The input time zone ID.
     * @param ianaID    Receives the preferred time zone ID in the IANA time zone database. When
     *                  the given time zone ID is not a known time zone ID, this method sets an
     *                  invalid (bogus) string.
     * @param status    Receives the status.  When the given time zone ID is not a known time zone
     *                  ID, U_ILLEGAL_ARGUMENT_ERROR is set.
     * @return  A reference to the result.
     * @draft ICU 74
     */
    static UnicodeString& U_EXPORT2 getIanaID(const UnicodeString&id, UnicodeString& ianaID,
        UErrorCode& status);

see https://github.com/unicode-org/icu/blob/main/icu4c/source/i18n/unicode/timezone.h

@sffc
Copy link
Contributor

sffc commented Dec 14, 2023

My understanding of the state of this issue is:

  1. We don't want to make changes until Temporal lands
  2. After Temporal lands, we introduce spec text and tests to recommended the 2-year transition period for new time zone IDs

Does that sound right @justingrant ?

@sffc sffc added s: comment Status: more info is needed to move forward s: blocked Status: the issue is blocked on upstream and removed s: comment Status: more info is needed to move forward labels Dec 14, 2023
@sffc sffc added this to the ES 2024 milestone Dec 14, 2023
@justingrant
Copy link
Contributor Author

We don't want to make changes until Temporal lands

In the meantime, do we want V8 and JSC to use the new ICU API to be able to return modern IDs like Asia/Calcutta from new Intl.DateTimeFormat().resolvedOptions().timeZone?

After Temporal lands, we introduce spec text and tests to recommended the 2-year transition period for new time zone IDs

Yep, this sounds right.

@sffc
Copy link
Contributor

sffc commented Dec 21, 2023

We don't want to make changes until Temporal lands

In the meantime, do we want V8 and JSC to use the new ICU API to be able to return modern IDs like Asia/Calcutta from new Intl.DateTimeFormat().resolvedOptions().timeZone?

Okay, yep, that change seems positive because there's already an expectation in code that the system time zone is subject to change and new identifiers can be added at any time.

Is there some sort of PR that can be put up to recommend this behavior in ECMA-402, split from the rest of the proposal in Temporal?

@justingrant
Copy link
Contributor Author

I'm not sure this necessarily needs any spec changes. Firefox already uses the modern IDs, and @anba has argued (convincingly, IMO) that the spec already requires using the latest IDs. So I think V8 and JSC can simply start using the new ICU APIs.

This won't solve all the cross-engine inconsistencies (@anba's comment) highlights a few corner cases, but the most popular ones should be handled by just using ICU.

Also, CLDR's data isn't necessarily complete. See https://unicode-org.atlassian.net/browse/CLDR-17111.

So there will be mop-up work required, but IMO it will be a lot easier to mop up once Calcutta and Kiev are handled.

One thing to watch out for is that these changes may break users who are expecting the old names, so it should be carefully rolled out in Canary before releasing to everyone.

@sffc sffc moved this from Previously Discussed to Priority Issues in ECMA-402 Meeting Topics Dec 27, 2023
@sffc sffc moved this from Priority Issues to Previously Discussed in ECMA-402 Meeting Topics Jan 18, 2024
@justingrant
Copy link
Contributor Author

In the 2024-01-18 meeting of TG2, we discussed part of this issue: whether implementations should move to use newer canonical IDs (e.g. Asia/Ho_Chi-Minh, Asia/Kolkata, Europe/Kyiv) before Temporal lants.

Consensus was that we should wait ~6 months to see if Temporal can land first, so that we can avoid changing things twice for users, but if Temporal was delayed then we can reconsider.

Looking back up at the OP in this issue, this conclusion answered question (3), and the conclusion was for (3b).

We still need to resolve (1) and (2):

  1. Which IDs should be canonical? After Temporal lands, the canonical time zones will still be used to return the system time zone (previously called DefaultTimeZone() in the spec, now called SystemTimeZoneIdenfitier()).
  2. How should canonicalization changes be implemented? Can we align across engines to reduce variation?

My suggestion to resolve both (1) and (2) is that all engines (including Mozilla's SpiderMonkey used in Firefox) should switch to using ICU's new API that returns modern IDs. And if we're unhappy with the canonical values returned by that API, then we should fix the data upstream in CLDR rather than engines overloading on their own. This won't happen for popular zones like Europe/Kyiv, but there are some corner cases and smaller zones (noted earlier in this thread) where it may matter.

We can discuss this in a later TG2 meeting. Not urgent.

FYI @sffc

@sffc
Copy link
Contributor

sffc commented Jan 18, 2024

My suggestion to resolve both (1) and (2) is that all engines (including Mozilla's SpiderMonkey used in Firefox) should switch to using ICU's new API that returns modern IDs.

I think this is the same as the proposal that we decided to delay until Temporal, right? Using the new ICU API would mean a user-visible change, which we should just roll out at the same time as the Temporal change.

@justingrant
Copy link
Contributor Author

justingrant commented Jan 19, 2024

Not necessarily. There are differences in how V8 vs. JSC vs. SM deal with ICU vis-a-vis canonicalization. For example, the way that Firefox currently reports the modern IDs is that AFAIK SM doesn't use ICU for canonicalization at all, but instead builds the canonical mapping separately.

I think we should ask all engines to use ICU's new API for canonicalization as part of their work to support Temporal, and if there are problems with the underlying CLDR data then we should raise them with CLDR now so that they'll be fixed in time for the release of Temporal on each engine. In practical terms, this means that we'd replace the current answer to (1) in the OP with simply "Use ICU's new API, and if the results have problems then work with CLDR to fix the data".

My understanding is that the above is what V8 is planning.

JSC is an interesting case where AFAIK it uses the OS's copy of ICU rather than bundling it into Safari like Chrome does. So there may be more lead time required to make that change than the other engines. I'm not sure what are the implications of this longer lead time, although @Constellation may know.

For SpiderMonkey, there's minimal user impact to switching to ICU's API because the only canonicalizations that will change are obscure cases. But if those obscure cases are blockers for SM, then we should figure that out now so SM can retire the custom canonicalization implementation. @anba, do you think that ICU's new API is now close enough for you to use it?

@sffc
Copy link
Contributor

sffc commented Jan 22, 2024

TG2 discussion: https://github.com/tc39/ecma402/blob/master/meetings/notes-2024-01-18.md#draft-plan-to-align-canonical-time-zone-ids-across-implementations-806

Conclusion (written before @justingrant's comment above): Do not make any changes right now. Wait for Temporal and make a change then. If there is a change to Temporal's timeline, then potentially revisit this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: datetime Component: dates, times, timezones s: blocked Status: the issue is blocked on upstream
Projects
ECMA-402 Meeting Topics
Previously Discussed
Development

No branches or pull requests

5 participants