Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-17518: Sync the editslog if a file is closed in the lease monitor #6809

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

ThinkerLei
Copy link
Contributor

In the lease monitor, if a file is closed, method checklease will return true, and then the edits log will not be sync. In my opinion, we should sync the edits log to avoid not synchronizing the state to the standby NameNode for a long time.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 15s #6809 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #6809
JIRA Issue HDFS-17518
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6809/3/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 00s No case conflicting files found.
+0 🆗 spotbugs 0m 01s spotbugs executables are not available.
+0 🆗 codespell 0m 01s codespell was not available.
+0 🆗 detsecrets 0m 01s detect-secrets was not available.
+1 💚 @author 0m 00s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 00s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 87m 17s trunk passed
+1 💚 compile 5m 56s trunk passed
+1 💚 checkstyle 4m 41s trunk passed
+1 💚 mvnsite 6m 23s trunk passed
+1 💚 javadoc 6m 00s trunk passed
+1 💚 shadedclient 142m 05s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 30s the patch passed
+1 💚 compile 3m 22s the patch passed
+1 💚 javac 3m 22s the patch passed
+1 💚 blanks 0m 01s The patch has no blanks issues.
+1 💚 checkstyle 2m 18s the patch passed
+1 💚 mvnsite 4m 01s the patch passed
+1 💚 javadoc 3m 28s the patch passed
+1 💚 shadedclient 155m 05s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 5m 13s The patch does not generate ASF License warnings.
411m 17s
Subsystem Report/Notes
GITHUB PR #6809
JIRA Issue HDFS-17518
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname MINGW64_NT-10.0-17763 b1f57066a6fa 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / 6c1e9a1
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6809/3/testReport/
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6809/3/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 00s No case conflicting files found.
+0 🆗 spotbugs 0m 01s spotbugs executables are not available.
+0 🆗 codespell 0m 01s codespell was not available.
+0 🆗 detsecrets 0m 01s detect-secrets was not available.
+1 💚 @author 0m 00s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 00s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 110m 34s trunk passed
+1 💚 compile 7m 49s trunk passed
+1 💚 checkstyle 6m 17s trunk passed
+1 💚 mvnsite 8m 58s trunk passed
+1 💚 javadoc 8m 27s trunk passed
+1 💚 shadedclient 181m 41s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 6m 01s the patch passed
+1 💚 compile 4m 44s the patch passed
+1 💚 javac 4m 44s the patch passed
+1 💚 blanks 0m 00s The patch has no blanks issues.
+1 💚 checkstyle 3m 22s the patch passed
+1 💚 mvnsite 5m 12s the patch passed
+1 💚 javadoc 4m 25s the patch passed
+1 💚 shadedclient 195m 47s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 7m 19s The patch does not generate ASF License warnings.
523m 52s
Subsystem Report/Notes
GITHUB PR #6809
JIRA Issue HDFS-17518
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname MINGW64_NT-10.0-17763 117457eea4e9 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / 6c1e9a1
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6809/2/testReport/
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6809/2/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@@ -626,7 +626,8 @@ private synchronized boolean checkLeases(Collection<Lease> leasesToCheck) {
}
}
// If a lease recovery happened, we need to sync later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice hack. But this will not handle the case, where actual recovery of the file is triggered and lease is reassigned.
Lease re-assignment also will have a edit log. This also should be synced as well.

I would recommend you to change the return type of internalReleaseLease() to ImmutablePair<Boolean, Boolean> to include both completed and needSync values.
needSync will be true in both cases of file closed and lease re-assignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinayakumarb Thank you for your review. Let me explain the current logic. The logic I am modifying now is as follows: if the lease is recovered or the lease is reassigned, it will return false, just like the previous logic. Then, in the checkLeases method, if the return is false and needSync is false, needSync will be reset to true. Subsequently, the edits log will be flushed by leaseMonitor. This way, when RPCs such as recoverLease call the internalReleaseLease method, they can remain consistent with the original behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend you to change the return type of internalReleaseLease() to ImmutablePair<Boolean, Boolean> to include both completed and needSync values.
needSync will be true in both cases of file closed and lease re-assignment.

+1. If we will plan to improve it, should fix it together.
BTW, what will it happen if not sync in time, LeaseManager.Monitor is one asynchronous logic, it can not be ensure to sync edits in one certain order right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, if we see the complete logic of internalReleaseLease() we need to call logSync() always.

There are two possibilities overall.

  1. File gets closed.
  2. lease recovery gets initiated with reassign of lease.

In both of above cases, there will be edit txn logged. So need to call logSync().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I totally agree to write and sync edit log here where 'write' already do that for both close and reassign lease but miss sync for some corner case. My point is do we need to sync them invoke logSync() in time? And what will it happen if sync by other write operation at following because edit from LeaseManager.Monitor is one asynchronous logic which is not have to in order IMO. Maybe it could be going to be missed if there are no other write operation later then NameNode shutdown?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hexiaoqiao Its perfectly fine if concurrently other write operations also calllogSync(). This transaction also will get included in the logSync() called by other threads. Its true for vice versa as well.

@ThinkerLei I wanted to make the if-else condition simple.

boolean completed=fsn.internalReleaseLease();
if (!needSync)) { 
    needSync = true;
} 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinayakumarb Thank you for your reply. I understand what you mean. But there is another case where calling logSync() is not required as mentioned in HDFS-17519 Do we need to consider this scenario described in HDFS-17519?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think that special case needs to be handled. If there is no txn, then also calling logSync() wont be a problem.

If there is no edit txn, logSync() will just return without doing anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned above, please change below logic to call logSync() always.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinayakumarb Thank you for your reply. How about changing the method checkLeases to return true?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants