[fix] Wait for run_lock when freeing it #4195

bruntib · 2024-03-20T17:05:42Z

A store event inserts a line to the run_lock table in order to prevent concurrent storage to the same run. The run_lock record is removed at the end of storage. The insertion and deletion of the run_lock record are locked operations on the database level. In CodeChecker we used "nowait" lock which means that these insert and delete operations fail and throw an exception if they can't happen immediately.

This is too strict in case of removing the run_lock object, because the thrown exception is forwarded to the user who can't handle it anyway. For this reason the run_lock removal is waiting for the database lock to be free. In the worst case the exception will still be thrown after the configured statement_timeout, but ideally that should be an unlikely event.

A store event inserts a line to the run_lock table in order to prevent concurrent storage to the same run. The run_lock record is removed at the end of storage. The insertion and deletion of the run_lock record are locked operations on the database level. In CodeChecker we used "nowait" lock which means that these insert and delete operations fail and throw an exception if they can't happen immediately. This is too strict in case of removing the run_lock object, because the thrown exception is forwarded to the user who can't handle it anyway. For this reason the run_lock removal is waiting for the database lock to be free. In the worst case the exception will still be thrown after the configured statement_timeout, but ideally that should be an unlikely event.

whisperity · 2024-03-21T08:24:21Z

With "nowait" option the query will be blocked until the lock is undone. In the worst case when statement_timeout is reached, the exception will be thrown anyway.

Unfortunately, this adds the requirement that the server should be configured with a STATEMENT TIMEOUT, which is not the default case. If the server operator does not configure this value, the store will hang forever, am I reading this right?

Can you please check if using SELECT ... FOR UPDATE SKIP LOCKED would be more beneficial for us? If I am reading the docs right (and I am not claiming I am!) then this would mean that in case the locking can not be done, we get back an empty result (so we use .one_or_none() to fetch the maybe-row) and we can thus deduce that another transaction happened to beat us to the punch, so there is no lock for us (the currently committed store) to remove. There is a skip_locked parameter in with_for_update().

To prevent the operation from waiting for other transactions to commit, use either the NOWAIT or SKIP LOCKED option. With NOWAIT, the statement reports an error, rather than waiting, if a selected row cannot be locked immediately. With SKIP LOCKED, any selected rows that cannot be immediately locked are skipped. Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general purpose work, but can be used to avoid lock contention with multiple consumers accessing a queue-like table.

vodorok

Maybe it is worth to try skip locked. (Even though most real-world database applications need to be guarded with statement timeouts.)

vodorok · 2024-04-10T20:36:30Z

web/server/codechecker_server/api/mass_store_run.py

        run_lock = session.query(RunLock) \
            .filter(RunLock.name == self.__name) \
-            .with_for_update(nowait=True).one()
+            .with_for_update(nowait=False).one()


Should only this one usage site be modified?

whisperity · 2024-04-11T09:42:16Z

(Even though most real-world database applications need to be guarded with statement timeouts.)

Still, guidelines are not requirements. If we have this as a requirement without which the server will misbehave, then the server startup process should query the db, check if there is a statement timeout, and show a warning to the operator.

bruntib · 2024-05-21T14:20:40Z

A different solution will be provided later.

bruntib added database 🗄️ Issues related to the database schema. bugfix 🔨 web 🌍 Related to the web app labels Mar 20, 2024

bruntib added this to the release 6.24.0 milestone Mar 20, 2024

bruntib requested review from whisperity, Szelethus and cservakt March 20, 2024 17:05

bruntib requested a review from vodorok as a code owner March 20, 2024 17:05

whisperity added server 🖥️ and removed web 🌍 Related to the web app labels Mar 27, 2024

vodorok reviewed Apr 10, 2024

View reviewed changes

bruntib closed this May 21, 2024

bruntib deleted the free_runlock_nowait branch May 22, 2024 07:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Wait for run_lock when freeing it #4195

[fix] Wait for run_lock when freeing it #4195

bruntib commented Mar 20, 2024

whisperity commented Mar 21, 2024

vodorok left a comment

vodorok Apr 10, 2024

whisperity commented Apr 11, 2024 •

edited

bruntib commented May 21, 2024

[fix] Wait for run_lock when freeing it #4195

[fix] Wait for run_lock when freeing it #4195

Conversation

bruntib commented Mar 20, 2024

whisperity commented Mar 21, 2024

vodorok left a comment

Choose a reason for hiding this comment

vodorok Apr 10, 2024

Choose a reason for hiding this comment

whisperity commented Apr 11, 2024 • edited

bruntib commented May 21, 2024

whisperity commented Apr 11, 2024 •

edited