-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run most file locking through functions that handle Ceph EIO #4924
Conversation
…o issues/4874-lock-io-error
@DailyDreaming This should be ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Minor suggestions.
else: | ||
# Something else went wrong | ||
os.close(fd) | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we close the file in either case? i.e.
except OSError as e:
os.close(fd)
if e.errno in (errno.EACCES, errno.EAGAIN):
# File is still locked by someone else.
# Look at the next file instead
continue
else:
# Something else went wrong
raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we do need a close before continuing.
else: | ||
# Something went wrong | ||
os.close(dirFD) | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
except OSError as e:
os.close(dirFD)
if not e.errno in (errno.EACCES, errno.EAGAIN):
# Not a locked file error. Something went wrong
raise
pass | ||
else: | ||
raise | ||
os.close(fd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
try:
fcntl.lockf(fd, fcntl.LOCK_UN)
except OSError as e:
if e.errno != errno.EIO:
# Sometimes Ceph produces errno.EIO . We don't need to retry because
# we're going to close the FD and after that the file can't remain
# locked by us.
raise
os.close(fd)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't leave the comment really in the right place relative to the code.
I had to change some This probably opens new ways to hang a Toil workflow if you have read access to its temp directories. |
This will fix #4874.
But it also makes us sit forever retrying locks if fcntl gives an EIO. Since officially the syscall isn't meant to be giving EIO, maybe this is fine?
Changelog Entry
To be copied to the draft changelog by merger:
Reviewer Checklist
issues/XXXX-fix-the-thing
in the Toil repo, or from an external repo.camelCase
that want to be insnake_case
.docs/running/{cliOptions,cwl,wdl}.rst
Merger Checklist