[Question] DataRace Exception, possible bug #150

bgervan · 2021-02-08T11:43:45Z

Hi,

Currently the writer checks for data race with:

if not is_delete then
    -- check to make sure we don't have a data race condition
    local updated = {}
    for i, pair in ipairs(cjson.decode(ARGV[13])) do
        local odata = redis.call('HGET', row_key, pair[1])
        if odata ~= pair[2] then
            table.insert(updated, pair[1])
        end
    end
    if #updated > 0 then
        return cjson.encode({race=updated})
    end
end

But, if the use case, the above will thow exception (return race):

Just write/update objects to the DB, without delete
The only requirements to push the latest to the DB
More service could send the very same object, in the same time

So if the second writer send an object, after the first one has already saved, it will return an error without reason.

Question:

Do I need this check in update only environments? Or what would be the good check in our case?
There can be cases when both writer stuck in a place and doesn't update after a point, even with force save.

Couldn't be a better way to use timestamp as identifier of which update is the newer and the above check would be useful only if the object were deleted?

Thank you for the answers.

The text was updated successfully, but these errors were encountered:

josiahcarlson · 2021-02-12T21:18:04Z

Please see: #116

Does the force_update_data() function cover your use-case? Should I add that to the library?

bgervan · 2021-02-14T13:54:55Z

Thanks for the suggestion, but not exactly, I have force update in the code, and I still get this exception.
The force update you are referring looks not optimal in a low latency system, it will do more query in bad situations (correct me if I am wrong).
In a use case, when the user doesn't care about if another writer changed the object which is about to be updated, the above data race check will throw errors, without need.

Let me give you an example:
The event object for x event type:

user_id (unique)
timestamp (always updated by writer)

The usage:

On event x:
The system checks if the event x happened before the last y second
If happened: do nothing
If not (like a timeout): Save the new timestamp with the same id -- this could be done by multiple writer in the "same" time

This is very similar mechanism I am using with rom. There are other usages with I ran into the same data race exceptions.

The goal would be:

Update if the timestamp is lower than the writer timestamp, which is a more general way to recognize data racing.

If we would have an option to skip

if not is_delete then
    -- check to make sure we don't have a data race condition

and check with last updated timestamps instead the data race, that would help a lot.

I can try to put together a small example project, if it helps. The reason I just not deleting that check, that I may get some bad side effects in real data race conditions.

josiahcarlson · 2021-02-19T02:40:59Z

To sum up: "sometimes you don't care about what is there, and you just want to overwrite".

Indeed, in cases where you don't care about overwriting data (because you don't care, or because you know there isn't a race condition), updating your local entity repeatedly will do more round trips than necessary, and will induce more latency, especially for commonly accessed entities.

use a lock (what Josiah would do)

However, I'm not feeling great about adding a feature to disable the safety I added in order to prevent data and index corruption. If you wanted to ensure that this never happens, you could use an explicit lock. Lock around the object, fetch/refresh, update local object, write, unlock. That guarantees that your round trips / delays are all up front in the lock, not in the write / update. It also gives you the ability to "deadline" the changes, so if it takes longer than a certain period of time, bail out. That makes sure that everything that I care about WRT data integrity is upheld. That's what I would recommend.

with rom.util.EntityLock(entity, acquire_timeout, lock_timeout) as lock:
    entity.refresh()
    for attr, value in data:
        setattr(entity, attr, value)
    entity.save()

With a little effort, you can turn the above 5x IOs (get + lock acquire + refresh + save + release) into 4 (lock acquire, get, save, release).

OR ignore most of the internals (you can do this, but you might break stuff)

Alternatively, if you're not indexing this data at all, and don't care about data integrity (seems like it), you can always do the following on some encoded data:

def unsafe_update_anything(model, id, dct):
    entity = model.get(id)
    entity._connection.hmset(entity._pkey, dct)
    # entity is now updated, took 2 round trips
    # with effort can be turned into just the hmset call for 1 round trip

This does 2 round trips. With a bit more effort, you can get that primary key in advance for anything, and call the model's ._connection.hmset(pkey, dct) directly for 1 round trip.

So, 1 - 5 round trips, depending on safety level / perf optimizations. Does that work?

josiahcarlson self-assigned this Feb 12, 2021

josiahcarlson added question more added when Josiah is waiting for more from op labels Feb 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] DataRace Exception, possible bug #150

[Question] DataRace Exception, possible bug #150

bgervan commented Feb 8, 2021

josiahcarlson commented Feb 12, 2021

bgervan commented Feb 14, 2021

josiahcarlson commented Feb 19, 2021

[Question] DataRace Exception, possible bug #150

[Question] DataRace Exception, possible bug #150

Comments

bgervan commented Feb 8, 2021

josiahcarlson commented Feb 12, 2021

bgervan commented Feb 14, 2021

josiahcarlson commented Feb 19, 2021

use a lock (what Josiah would do)

OR ignore most of the internals (you can do this, but you might break stuff)