Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Wait for run_lock when freeing it #4195

Closed
wants to merge 1 commit into from

Conversation

bruntib
Copy link
Contributor

@bruntib bruntib commented Mar 20, 2024

A store event inserts a line to the run_lock table in order to prevent concurrent storage to the same run. The run_lock record is removed at the end of storage. The insertion and deletion of the run_lock record are locked operations on the database level. In CodeChecker we used "nowait" lock which means that these insert and delete operations fail and throw an exception if they can't happen immediately.

This is too strict in case of removing the run_lock object, because the thrown exception is forwarded to the user who can't handle it anyway. For this reason the run_lock removal is waiting for the database lock to be free. In the worst case the exception will still be thrown after the configured statement_timeout, but ideally that should be an unlikely event.

A store event inserts a line to the run_lock table in order to prevent
concurrent storage to the same run. The run_lock record is removed at
the end of storage. The insertion and deletion of the run_lock record
are locked operations on the database level. In CodeChecker we used
"nowait" lock which means that these insert and delete operations fail
and throw an exception if they can't happen immediately.

This is too strict in case of removing the run_lock object, because the
thrown exception is forwarded to the user who can't handle it anyway.
For this reason the run_lock removal is waiting for the database lock
to be free. In the worst case the exception will still be thrown after
the configured statement_timeout, but ideally that should be an
unlikely event.
@bruntib bruntib added database 🗄️ Issues related to the database schema. bugfix 🔨 web 🌍 Related to the web app labels Mar 20, 2024
@bruntib bruntib added this to the release 6.24.0 milestone Mar 20, 2024
@bruntib bruntib requested a review from vodorok as a code owner March 20, 2024 17:05
@whisperity
Copy link
Member

With "nowait" option the query will be blocked until the lock is undone. In the worst case when statement_timeout is reached, the exception will be thrown anyway.

Unfortunately, this adds the requirement that the server should be configured with a STATEMENT TIMEOUT, which is not the default case. If the server operator does not configure this value, the store will hang forever, am I reading this right?

Can you please check if using SELECT ... FOR UPDATE SKIP LOCKED would be more beneficial for us? If I am reading the docs right (and I am not claiming I am!) then this would mean that in case the locking can not be done, we get back an empty result (so we use .one_or_none() to fetch the maybe-row) and we can thus deduce that another transaction happened to beat us to the punch, so there is no lock for us (the currently committed store) to remove. There is a skip_locked parameter in with_for_update().

To prevent the operation from waiting for other transactions to commit, use either the NOWAIT or SKIP LOCKED option. With NOWAIT, the statement reports an error, rather than waiting, if a selected row cannot be locked immediately. With SKIP LOCKED, any selected rows that cannot be immediately locked are skipped. Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general purpose work, but can be used to avoid lock contention with multiple consumers accessing a queue-like table.

@whisperity whisperity added server 🖥️ and removed web 🌍 Related to the web app labels Mar 27, 2024
Copy link
Collaborator

@vodorok vodorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is worth to try skip locked. (Even though most real-world database applications need to be guarded with statement timeouts.)

run_lock = session.query(RunLock) \
.filter(RunLock.name == self.__name) \
.with_for_update(nowait=True).one()
.with_for_update(nowait=False).one()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should only this one usage site be modified?

@whisperity
Copy link
Member

whisperity commented Apr 11, 2024

(Even though most real-world database applications need to be guarded with statement timeouts.)

Still, guidelines are not requirements. If we have this as a requirement without which the server will misbehave, then the server startup process should query the db, check if there is a statement timeout, and show a warning to the operator.

@bruntib
Copy link
Contributor Author

bruntib commented May 21, 2024

A different solution will be provided later.

@bruntib bruntib closed this May 21, 2024
@bruntib bruntib deleted the free_runlock_nowait branch May 22, 2024 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix 🔨 database 🗄️ Issues related to the database schema. server 🖥️
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants