Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Improve the tryCommitOnce behavior in FileStoreCommitImpl #3351

Open
2 tasks done
xiangyuf opened this issue May 20, 2024 · 1 comment
Open
2 tasks done
Labels
enhancement New feature or request

Comments

@xiangyuf
Copy link
Contributor

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

When using dedicated compactions in production, we've found the write only job and compact job will failover every 2 or 3 days even if the remote filesystem support atomic rename operation.

The main cause is the FileAlreadyExistsException:
image

Checking with recent Hadoop API Rename implementation, we found the rename api will return FileAlreadyExistsException for rename api instead of false by default.
https://github.com/apache/hadoop/blob/branch-3.3.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirRenameOp.java
image

IMHO, this can be improved by catch certain exceptions in tryCommitOnce and return false to upper caller.

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@xiangyuf xiangyuf added the enhancement New feature or request label May 20, 2024
@xiangyuf
Copy link
Contributor Author

Hi @JingsongLi, WDYT about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant