Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: more granular conversion from object_store::Error to Error #2316

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

broccoliSpicy
Copy link
Contributor

@broccoliSpicy broccoliSpicy commented May 9, 2024

This PR tries to fix issue #2067 by adding a more granular conversion from object_store::Error types to Error types.

Copy link

github-actions bot commented May 9, 2024

ACTION NEEDED

Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@broccoliSpicy broccoliSpicy changed the title fix issue#2067, more granular conversion from object_store::Error to Error. fix: more granular conversion from object_store::Error to Error. May 9, 2024
@github-actions github-actions bot added the bug Something isn't working label May 9, 2024
@broccoliSpicy broccoliSpicy changed the title fix: more granular conversion from object_store::Error to Error. fix: more granular conversion from object_store::Error to Error May 9, 2024
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 21.21212% with 26 lines in your changes are missing coverage. Please review.

Project coverage is 80.67%. Comparing base (89e8eb6) to head (cfe0ddb).

Files Patch % Lines
rust/lance-core/src/error.rs 21.21% 26 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2316      +/-   ##
==========================================
- Coverage   80.68%   80.67%   -0.02%     
==========================================
  Files         191      191              
  Lines       56695    56721      +26     
  Branches    56695    56721      +26     
==========================================
+ Hits        45746    45761      +15     
- Misses       8386     8402      +16     
+ Partials     2563     2558       -5     
Flag Coverage Δ
unittests 80.67% <21.21%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this.

I think we should probably go for a different approach. Some of these error mappings take something very generic, and map it to a very specific error. For example, object_store::Error::NotFound could be any file isn't found, but we are mapping it here to specifically mean that the dataset is not found. Or object_store::Error::InvalidPath being mapped to Error::InvalidInput, even though the path could be one that internal and not user provided, and thus a more appropriate error would be Error::Internal.

Here's what I think might be a better solution:

  1. Change Error::IO variant to hold BoxError, so it could be downcast if needed.
  2. To solve the "opening non-existent dataset should return Error::NotFound instead of Error::IO" issue, we should add the error the specific code path where we generate the error. We should also have a unit test that validates we produce an error.

If you'd like, you can just start with a PR to do (1), as I think that on it's own would be a meaningful improvement.

@broccoliSpicy
Copy link
Contributor Author

broccoliSpicy commented May 9, 2024

Thanks for the feedback, I will start with (1) and see what I can do with (2).

@broccoliSpicy
Copy link
Contributor Author

broccoliSpicy commented May 10, 2024

@wjones127
a few questions regarding (1): Change Error::IO variant to hold BoxError, so it could be downcast if needed:

if we change Error::IO from

#[derive(Debug, Snafu)]
#[snafu(visibility(pub))]
pub enum Error {
    ...
     #[snafu(display("LanceError(IO): {message}, {location}"))]
    IO { message: String, location: Location },
    ...
}

to

#[derive(Debug, Snafu)]
#[snafu(visibility(pub))]
pub enum Error {
    ...
    #[snafu(display("LanceError(IO): {source}, {location}"))]
    IO { source: BoxedError, location: Location },
    ...
}

and maybe add a helper function like this:

impl Error {
    pub fn io(message: impl Into<String>, location: Location) -> Self {
        let message: String = message.into();
        Self::IO {
            source: message.into(),
            location,
        }
    }
}

do we want Error::IO to be a wrapper of other Error enums? like this:
example source

           return Err(Error::IO {
                source: Box::new(Error::Internal {
                    message: "HashJoiner: No data".to_string(),
                    location: location!(),
                }),
                location: location!()
            });

if so, where do we expect to downcast the Error::IO to the specific Error enum?
also, the two location!() here has redundancy

or we just make everything Error::IO, like this:

             return Err(Error::io(
                    "HashJoiner: No data".to_string(),
                    location!(),
            ));

I am very new to Lance and have only read a small portion of the codebase yet so sorry if these are naive questions

@wjones127
Copy link
Contributor

do we want Error::IO to be a wrapper of other Error enums? like this:
example source

return Err(Error::IO {
    source: Box::new(Error::Internal {
      message: "HashJoiner: No data".to_string(),
      location: location!(),
  }),
  location: location!()
});

No, it's not meant to wrap our own errors. The source field will wrap other error types, like object store error.

Also, keep in mind some of the error handling in our code base is very bad. This is a good example where someone used an IO error when the should have used something else, since this error doesn't have to do with reading or writing data from some IO source/sink. So if you see something like this in our codebase, do not assume it is "the right way".

What I meant is that object store errors would be put in a box and wrapped in Error::IO.

if so, where do we expect to downcast the Error::IO to the specific Error enum?
also, the two location!() here has redundancy

The downcasts would not be in our codebase. This is something users of our library would do if they wanted to get the underlying object store error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants