Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing fun() to ra:*_query() may break when executed on a remote node #238

Open
dumbbell opened this issue Dec 19, 2023 · 5 comments · May be fixed by #242
Open

Passing fun() to ra:*_query() may break when executed on a remote node #238

dumbbell opened this issue Dec 19, 2023 · 5 comments · May be fixed by #242
Labels
bug Something isn't working

Comments

@dumbbell
Copy link
Member

Passing an anonymous function reference for execution on a remote node may break if the remote node has another version of Erlang or the module hosting the anonymous function.

It would be better to call an exported function.

Reporter by: @ansd

@dumbbell dumbbell added the bug Something isn't working label Dec 19, 2023
@dumbbell dumbbell added this to the v0.11.0 milestone Dec 19, 2023
@dumbbell
Copy link
Member Author

dumbbell commented Jan 12, 2024

I'm wondering if the use of an anonymous fun() should be denied with leader and consistent queries (i.e. only allowed for local queries). Otherwise, it might lead to rare issues that are difficult to debug.

@ansd, @dcorbacho, @the-mikedavis: Any opinion?

@ansd
Copy link
Member

ansd commented Jan 12, 2024

ra:local_query(ServerId, ...) means locally on the given ServerId, which may be a remote node. In that case, anonymous funs should still not be used.

Otherwise, it might lead to rare issues that are difficult to debug.

What are the rare issues that are difficult to debug when always using the {M,F,A} query_fun() format?

@dumbbell
Copy link
Member Author

ra:local_query(ServerId, ...) means locally on the given ServerId, which may be a remote node.

Khepri uses local queries with the local node only:

LocalServerId = {StoreId, node()},
Ret = ra:local_query(LocalServerId, QueryFun, Timeout),

What are the rare issues that are difficult to debug when always using the {M,F,A} query_fun() format?

Sorry I was unclear.

I prepared a patch so that Khepri always use MFA. We should be fine on this front. However, APIs such as khepri:fold/5 may continue to receive fun() as argument. Thus I’m wondering if Khepri should accept fun() when doing a local fold only, and reject otherwise.

dumbbell added a commit that referenced this issue Jan 12, 2024
[Why]
Using `fun()` is fragile if it has to be executed on a remote node.
Indeed, the remote node may have another version of the module hosting
the function and the function reference may be invalid.

[How]
Now, we always use an exported function.

Callers of any folding APIs should do the same.

Fixes #238.
@dumbbell dumbbell linked a pull request Jan 12, 2024 that will close this issue
@ansd
Copy link
Member

ansd commented Jan 12, 2024

Yes, I think that's a good idea.

dumbbell added a commit that referenced this issue Jan 18, 2024
[Why]
Using `fun()` is fragile if it has to be executed on a remote node.
Indeed, the remote node may have another version of the module hosting
the function and the function reference may be invalid.

[How]
Now, we always use an exported function.

Callers of any folding APIs should do the same.

Fixes #238.
@dumbbell dumbbell removed this from the v0.11.0 milestone Jan 25, 2024
@dumbbell
Copy link
Member Author

After putting more thoughts into this, I believe the problem is larger. In fact, there are two problems.

  1. The validity of a function reference on a remote node
  2. The possible difference of code between the local copy and the remote one

We could add one somewhat related issue: Ra machine versions compatibility.

The validity of a function reference on a remote node

This the one we attempt to fix here. Indeed, depending on the version of Erlang/OTP or the remote module hosting the function, the reference could be invalid and the Ra machine could crash trying to run it. This can be solved by always using exported functions and MFA as proposed in the current patch.

But this brings us to the second problem…

The possible difference of code between the local copy and the remote one

If Khepri or the application using it are updated on one of the Erlang nodes, the code of the query the caller thinks will be executed might be different or even incompatible with the code actually executed on the remote node. This could mean an unexpected result or a crash.

Ra machine versions compatibility

Speaking of upgrades, we have the same king of problem if Khepri’s Ra machine’s state changes. If the query code doesn’t know about the new or old state, again, it can give bad results or crash.

That’s why I’m pausing work on this patch and started to discuss solutions with @kjnilsson.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants