Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-grained error types #4638

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian commented Feb 28, 2024

This is my vision for errors in ICU4X 2.0. Instead of having one catch-call sum type per crate, we should be more exact about which methods can return which errors. Let's look at icu::calendar::Error for example: this is returned by both Chinese::new_unstable, and Date::try_new_gregorian_date, meaning the user has to handle ::Parse, ::Overflow, etc. for Chinese::new_unstable, as well as ::Data for Date::try_new_gregorian_date (of course those will never happen).

We're trying to be very explicit about data in ICU4X, yet DataErrors appear everywhere because they are a variant in the catch-all error types. So the idea is:

  • constructors only ever return DataError. If using compiled data, the client can be reasonably sure that this error is impossible
    • this is easy to do in all crates except for datetime; there the constructor currently returns variants like ::UnsupportedField (I think this is weird)
    • this really simplifies constructors that depend on other components. We no longer need DateTimeError::Plurals et al
    • we ourselves will already benefit from this, for example with properties we have a lot of code like
      let gc = icu_properties::maps::load_general_category(provider).map_err(|e| {
      let PropertiesError::PropDataLoad(e) = e else {
      unreachable!()
      };
      e
      })?;
      .
    • Data invariant violations should be modeled as DataError::custom. This lets us associate a data key with the error (i.e. https://github.com/unicode-org/icu4x/pull/4638/files#diff-39322bc0d217068e658c354db35fa09d51dea5c65707b3f28a129bb261e623b9), and compiled data does not have to care about these. The litmus test for what should be a DataError should be: does this not happen for (default) compiled data?
  • actual functionality never returns DataErrors, as the data is already loaded (this is ICU4X's model)
  • instead of catch-all error types, each method errors with a custom enum that contains exactly the error variants that this method can return
  • the custom enum is exhaustive so that the client can exhaustively handle the error
    • a method should not suddenly require to return a new error variant, that's a behaviour change that should require a new API
  • If a method returns too many different error types, it might be a sign that we should split it into a better API. An example of this is icu::properties::sets::load_for_ecma262_unstable: this currently returns either a DataError or an invalid property error. We can split this into two methods, one that parses the &str into a property, and one that loads the data for the property.

I've mocked up this change for most crates in this PR. Often, the only error variant was DataError, in which case this change was trivial, but sometimes there were other variants that required introducing custom error types. Datetime and calendar are not done, as these crates have huge error types.

@robertbastian robertbastian added the discuss Discuss at a future ICU4X-SC meeting label Feb 28, 2024
@robertbastian robertbastian added the 2.0-breaking Changes that are breaking API changes label Feb 28, 2024
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reopens a previous discussion so it needs more discussion

@sffc sffc modified the milestones: ICU4X 2.0, 1.5 Blocking ⟨P1⟩ Feb 29, 2024
@sffc
Copy link
Member

sffc commented Feb 29, 2024

Discuss with:

@sffc
Copy link
Member

sffc commented Feb 29, 2024

This PR largely undoes #2649

The style of returning a wrapped DataError was an explicit choice in #2639

Also some discussion that included @robertbastian in #153 (comment)

In general I am in favor of returning smaller error enums in functions that have very specifically documented invariants. Where I differ is in returning a wrapped vs unwrapped error. Returning a non-exhaustive error enum wrapping other variants is preferable to me because:

  1. Wrapping the error with something else using displaydoc gives more context. "Error creating ListFormatter: Missing locale" is more useful and actionable than "Missing locale".
  2. Using an enum allows us to add more variants in the future. I want to emphasize that this does not need to be done in a way that breaks the API stability. For example, a constructor that takes an options bag could gain an option that is valid only in certain circumstances, and a corresponding error variant gets added to the enum. Basically, using a non-exhaustive error enum gives us, the ICU4X team, more flexibility.
  3. It is somewhat convenient that most constructors return crate::Error (re-exported as crate::CrateError). It makes it easy to find the error type and write code to handle it such as with Anyhow/Eyre.
  4. The standard coding style for errors that I have always seen is to catch specific error variants and propagate all others. We can get more specific and more exhaustive in our error types, but people should be calling us in fallible functions anyway, so they should just propagate errors they don't understand.
  5. Although DataError is the most common error type right now, we may at some point want to add an OomError for memory allocation errors, which will show up in a lot of places, including in constructors.

@robertbastian
Copy link
Member Author

#2639 doesn't really contain any discussion, not even with the people that were linked (I was not).

In #153 we decided this for 1.0, but I'm now convinced this was the wrong decision, which is why I want to reopen this for 2.0.

Using an enum allows us to add more variants in the future. I want to emphasize that this does not need to be done in a way that breaks the API stability. For example, a constructor that takes an options bag could gain an option that is valid only in certain circumstances, and a corresponding error variant gets added to the enum. Basically, using a non-exhaustive error enum gives us, the ICU4X team, more flexibility.

Let's use Java's exceptions as a comparison here. There are checked and unchecked exceptions, checked exceptions are declared on the API and need to be caught by the user, whereas unchecked exceptions are not declared on the API and the user doesn't need to catch them (but can). #[non_exhaustive] error types are like unchecked exceptions: the user can handle them, but we can always add more variants that they are not handling without them noticing. Checked exceptions on the other hand force the user to handle them (which is a better API), and it's an API breakage to add a checked exception type to a method. This is good, a user can write an exception-free program like this.

Yes, using a non-exhaustive error enum gives us more flexibility (like throwing unchecked exceptions in Java gives you more flexibility), but it comes at the cost of users having to pass around unknown future error variants everywhere. This either leads to them being unwrapped somewhere, or users having to include fallback code for situations that do not happen.

Although DataError is the most common error type right now, we may at some point want to add an OomError for memory allocation errors, which will show up in a lot of places, including in constructors.

And this should totally be a breaking change. Users should explicitly handle these errors, not bubble them through and ignore/unwrap them somewhere. Also, such a change would probably require a bigger API redesign, such as making things generic in the allocator used.

@robertbastian
Copy link
Member Author

Also, in #153 I did say

@robertbastian - For 1.0 we should at least make sure errors are not overly nested. If there is a data error, there should be only one variant to check.

We failed to do that. In datetime we have DateTimeError::Data(DataError), DateTimeError::Plurals(PluralsError::Data(DataError)), DateTimeError::FixedDecimalError(DecimalError::Data(DataError), and more!

@Manishearth
Copy link
Member

Manishearth commented Mar 13, 2024

Proposal:

  1. Error enums are fine grained, but #[non_exhaustive] by default. This will imply some level of error wrapping
  2. At a case by case level we can make decisions on things that are truly exhaustive. Here are some cases where we predecide that something is exhaustive.
    a. For constructors that take only a provider (exhaustively return DataError)
    b. For constructors that take a provider and a locale, if the error cannot occur for default compiled data (exhaustively return DataError). See note about 2b later
    c. For narrow functions like FixedDecimal::from_f64 (?); when not currently possible, consider redesigning the API so that clients can have safe intermediate steps.
    d. We are deferring decisions on things with options right now (see comment below)
  3. When we make things exhaustive, if that allows us to collapse error wrapping, we can evaluate on a case-by-case basis. Here are some cases where we predecide.
    a. Constructors that return only an exhaustive DataError can return a DataError without wrapping
  4. Avoid returning () as the error type; return a unit struct #[non_exhaustive] struct FooError; with nice docs, non-exhaustive so that we can add context to it in the future.

Note about 2b:
What about cases where the locale has conflict with the type (e.g. CalendarMismatch)

We should always use fine-grained custom/wrapped errors for this. If such a case is introduced by CLDR/etc, it is unlikely that users would be expecting such a change and may already be unreachable!()-ing on the error, so it is probably not nice to our users to silently introduce such an error. In such a case we should add a new API that can produce this error anyway.

Note about 3a:
Most cases of wrapping tell you "which crate caused the data error", this is not particularly useful to the user, it is useful to icu4x devs but we have error logging for that and can also just debug directly. The main useful thing for the user is which key was involved since they can fix their data pipeline, but that is already a part of DataError (and it's more fine-grained than crate-level).

Agreed: @Manishearth @robertbastian @sffc

@Manishearth
Copy link
Member

For things with options, one solution we can have is requiring validation if the options ever gain invariants. We're not making the decision for now.

Either split options/validatedoptions

let validated_options = options.validate()?;
foo::new(validated_options, &provider).unwrap("data is valid")

Or add private fields:

#[non_exhaustive]
struct Options {
   pub plural_category: PC,
   weird_thing: u8, 
}

Options::new_with_weird(pc, u8) -> Result<Self>

let foo = Options::new();
foo.plural_category = PC::Cardinal;
foo = foo.with_weird(23).expect("oops");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0-breaking Changes that are breaking API changes
Projects
Status: Small breakage (defer to end)
Development

Successfully merging this pull request may close these issues.

None yet

3 participants