Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] DataRace Exception, possible bug #150

Open
bgervan opened this issue Feb 8, 2021 · 3 comments
Open

[Question] DataRace Exception, possible bug #150

bgervan opened this issue Feb 8, 2021 · 3 comments
Assignees
Labels
more added when Josiah is waiting for more from op question

Comments

@bgervan
Copy link

bgervan commented Feb 8, 2021

Hi,

Currently the writer checks for data race with:

if not is_delete then
    -- check to make sure we don't have a data race condition
    local updated = {}
    for i, pair in ipairs(cjson.decode(ARGV[13])) do
        local odata = redis.call('HGET', row_key, pair[1])
        if odata ~= pair[2] then
            table.insert(updated, pair[1])
        end
    end
    if #updated > 0 then
        return cjson.encode({race=updated})
    end
end

But, if the use case, the above will thow exception (return race):

  • Just write/update objects to the DB, without delete
  • The only requirements to push the latest to the DB
  • More service could send the very same object, in the same time

So if the second writer send an object, after the first one has already saved, it will return an error without reason.

Question:

Do I need this check in update only environments? Or what would be the good check in our case?
There can be cases when both writer stuck in a place and doesn't update after a point, even with force save.

Couldn't be a better way to use timestamp as identifier of which update is the newer and the above check would be useful only if the object were deleted?

Thank you for the answers.

@josiahcarlson josiahcarlson self-assigned this Feb 12, 2021
@josiahcarlson josiahcarlson added question more added when Josiah is waiting for more from op labels Feb 12, 2021
@josiahcarlson
Copy link
Owner

Please see: #116

Does the force_update_data() function cover your use-case? Should I add that to the library?

@bgervan
Copy link
Author

bgervan commented Feb 14, 2021

Thanks for the suggestion, but not exactly, I have force update in the code, and I still get this exception.
The force update you are referring looks not optimal in a low latency system, it will do more query in bad situations (correct me if I am wrong).
In a use case, when the user doesn't care about if another writer changed the object which is about to be updated, the above data race check will throw errors, without need.

Let me give you an example:
The event object for x event type:

  • user_id (unique)
  • timestamp (always updated by writer)

The usage:

  • On event x:
  • The system checks if the event x happened before the last y second
  • If happened: do nothing
  • If not (like a timeout): Save the new timestamp with the same id -- this could be done by multiple writer in the "same" time

This is very similar mechanism I am using with rom. There are other usages with I ran into the same data race exceptions.

The goal would be:

  • Update if the timestamp is lower than the writer timestamp, which is a more general way to recognize data racing.

If we would have an option to skip

if not is_delete then
    -- check to make sure we don't have a data race condition

and check with last updated timestamps instead the data race, that would help a lot.

I can try to put together a small example project, if it helps. The reason I just not deleting that check, that I may get some bad side effects in real data race conditions.

@josiahcarlson
Copy link
Owner

To sum up: "sometimes you don't care about what is there, and you just want to overwrite".

Indeed, in cases where you don't care about overwriting data (because you don't care, or because you know there isn't a race condition), updating your local entity repeatedly will do more round trips than necessary, and will induce more latency, especially for commonly accessed entities.

use a lock (what Josiah would do)

However, I'm not feeling great about adding a feature to disable the safety I added in order to prevent data and index corruption. If you wanted to ensure that this never happens, you could use an explicit lock. Lock around the object, fetch/refresh, update local object, write, unlock. That guarantees that your round trips / delays are all up front in the lock, not in the write / update. It also gives you the ability to "deadline" the changes, so if it takes longer than a certain period of time, bail out. That makes sure that everything that I care about WRT data integrity is upheld. That's what I would recommend.

with rom.util.EntityLock(entity, acquire_timeout, lock_timeout) as lock:
    entity.refresh()
    for attr, value in data:
        setattr(entity, attr, value)
    entity.save()

With a little effort, you can turn the above 5x IOs (get + lock acquire + refresh + save + release) into 4 (lock acquire, get, save, release).

OR ignore most of the internals (you can do this, but you might break stuff)

Alternatively, if you're not indexing this data at all, and don't care about data integrity (seems like it), you can always do the following on some encoded data:

def unsafe_update_anything(model, id, dct):
    entity = model.get(id)
    entity._connection.hmset(entity._pkey, dct)
    # entity is now updated, took 2 round trips
    # with effort can be turned into just the hmset call for 1 round trip

This does 2 round trips. With a bit more effort, you can get that primary key in advance for anything, and call the model's ._connection.hmset(pkey, dct) directly for 1 round trip.

So, 1 - 5 round trips, depending on safety level / perf optimizations. Does that work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more added when Josiah is waiting for more from op question
Projects
None yet
Development

No branches or pull requests

2 participants