Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support saving of entities to memcache in putMulti() #58

Open
trakhimenok opened this issue Nov 7, 2017 · 7 comments
Open

Support saving of entities to memcache in putMulti() #58

trakhimenok opened this issue Nov 7, 2017 · 7 comments

Comments

@trakhimenok
Copy link

trakhimenok commented Nov 7, 2017

If I got it right currently entities are not stored to memcache on Put().

Suppose we run in transaction and need to update some entities. Then current execution order is:

  1. User code calss nds.RunInTransaction()
  2. Get entity by key from datastore (not stored to memcache)
  3. Change entities
  4. User code calls nds.PutMulti() with transactional context
  5. nds.putMulti() add items to tx.lockMemcacheItems (memcache not changed)
  6. nds.putMulti() call datastorePutMulti()
  7. nds.putMulti() executes deferrer that removes from memcache tx.lockMemcacheItems.

First of all it's not clear why we create lockItem() at line 129 as it seems it is not used further in current implementation.

Second just removing items from memcache results in unnecessary paid Get() operation on following request while potentially results could be retrieved from the "free" memcache. For example I have a chat application and whenever user writes to the bot and bot sends response I do following:

  1. On request Chat entity is retrieved outside of transaction
  2. On response Chat entity updated within transaction (if state changed)
    Currently the first Get() is not leveraging memcache with always a miss. This increases response time & costs.

So the suggestion is:

  1. Before putting nds should try to lock items in memcache.
  2. If previous pre-put locks was successful nds tries to put items to memcaches using CAS.
  3. If lock or putting to memcache failed try delete from memcache as currently.
  4. Not related improvement: If delete from memcache failed create a deferred task for removing items from memcache. That could mitigate temporarily memcache unavailability.

I know caching is hard to get right and probably I'm missing something.

edit: fixed urls to permalinks

@jongillham
Copy link
Member

Hi @astec and thanks for your interest. You are right, caching is hard and it's going to take me a while to get the nds cache strategy back in my head so please forgive me if the following is not 100% accurate.

First of all I think your analysis of step 7 is incorrect which might explain your issue with the nds strategy:

  1. nds.putMulti() executes deferrer that removes from memcache tx.lockMemcacheItems.

I think this statement is wrong. Within a transaction a lock is only placed here when we are sure the transaction will succeed and it is never removed. Although it expires after memcacheLockTime which happens to be 32 seconds (2 seconds longer than the maximum time a transaction can last).

However, outside a transaction memcache lock is removed in the putMulti method as you suggest. Removing the lock from within the nds.putMulti method within a transaction would not work for a couple of reasons, one being a subsequent function within the transaction might update the same entity but we would have already removed the lock.

The reason we create memcache lock items in the first place is to ensure that any stale cache is guaranteed to be removed before a subsequent Put and not returned by Get. You can see the lock being used in the lockMemcache and loadMemcache functions.

If we used your suggestion and the actual datastore.Put failed but not the memcache.Set then we would have updated items in memcache but not the datastore and breach the consistency guarantees.

I hope this helps. It's fun to think about all the edge cases and error pathways in order to keep cache consistency. Maybe/hopefully you can come up with a better strategy!

@trakhimenok
Copy link
Author

trakhimenok commented Nov 7, 2017

@jongillham,

First of all sorry I made some links to my fork instead of your repo and some lead to wrong lines due my recent commit - edited to permalinks.

I agree that statement 7 is incorrect, overlooked the if condition.

I still believe you can save entities to memcache on Put() consistently within nds.RunInTransaction().

Consider this flow:

  1. Instantiate tx before calling datastore.RunInTransaction(), at line 28 instead of line 30.
  2. During PutMulti() within transaction lock the items in memcache (same as it's done for non-transactional flow now)
  3. Instead of returning immediately at line 29 check for error and if nil call memcacheSetMulti for succesfully locked items.
  4. If transaction or memcacheSetMulti() failed probably try to delete locked items?
  5. Profit!?

If put/transaction failed we don't save to memcache. While tx is in progress other process would not override the items as it's locked. If memcache lock was not successful we simply do not put entities to memcache and basically fall back to current behaviour. If everything works as intended (I guess would be 99.99% in my use case) we gain 50% less datastore reads for read-tx_read-tx_put scenarios.

Am I missing anything?

@jongillham
Copy link
Member

jongillham commented Nov 8, 2017

I like that idea and from what I can see this would work 99.99% of the time as you suggest. However there is a 0.01% race condition that would lead to cache inconsistency. I imagine that under high instance and memcache load it could become an issue and incredibly difficult to debug.

Consider the scenario that will cause cache consistency:
t = 0s: A transaction is created and a Put is called within it. A memcache lock as you suggest in point 2 is created. This lock lasts for 32 seconds at most but can be shorter if it is purged.
t = 30s: Processing within the transaction commits as with your point 3.
t = 30s - 32s: The instance decides to run another goroutine without memcacheSetMulti being called.
t = 32s: The original lock at t = 0s expires from memcache in the best case.
t = 33s: Another datastore Put updates the entity referenced at t = 0 with a new value.
t = 34s: Finally point 4 gets run because the instance has started running our original goroutine again and memcacheSetMulti is called, it now replaces the new memcache value set at t = 33s with the old one stale one that was originally added at t = 0.

I often see long pauses and/or memcache purges between App Engine RPC calls that could make this race condition a reality.

@trakhimenok
Copy link
Author

At first I thought it's a very good point and we are doomed.

But after considerations I think there is no race or it can be mitigated.

Basically we don't care about evicted or expired lock items in memcache. We just need to know if the lock was successful in first place so we can try to do CompareAndSwap later on transaction success.

If the lock evicted/expired while the 1st transaction is in progress and the 2nd transaction overwrites entity with another value then:

  1. If 2nd datastore transaction was successful the 1st one will fail on commit (thanks to datastore), so the 1st one would not write to memcache at all. (so no race?)

  2. Even if it's not failed it's still should not be an issue as we should use memcache .CompareAndSwapMulti(). Seems that would require another memcache.GetMulti() right after transaction successful. When we got the items before CompareAndSwap we should check it's our own locks (that mechanics you already have with random numbers). If our own - CompareAndSwap, if not (I don't see how we can get here) - delete just in case.

@jongillham
Copy link
Member

  1. If 2nd datastore transaction was successful the 1st one will fail on commit (thanks to datastore), so the 1st one would not write to memcache at all. (so no race?)

Unfortunately the datastore transaction contention semantics will not help us at this point because it would have already committed the transaction at this line which you mention in point 3 in the comment above.

  1. Even if it's not failed it's still should not be an issue as we should use memcache .CompareAndSwapMulti(). Seems that would require another memcache.GetMulti() right after transaction successful. When we got the items before CompareAndSwap we should check it's our own locks (that mechanics you already have with random numbers). If our own - CompareAndSwap, if not (I don't see how we can get here) - delete just in case.

I currently can't think of a reason why this wouldn't work! I'm going to have to think about its interaction with Get and Delete datastore operations some more.

@trakhimenok
Copy link
Author

Good to know we have a hope.

I don’t get your point reagrads datastore tx semantic. As i stated before the memcache check & put should happen outside of datastore tx.

Instead of “return datastore.RunInTransaction()” you should do “err = datastore.RunInTransaction()“ and if err is nil do the memcache checks & put. If the 2nd transaction committed the 1st should fail, we get err!=nil and instead of saving to memcache do the delete from memcache (to clear locks in case if failed due to non conflict). They can’t both return err==nil if updating same entity in parallel, isn’t it? If they do it’s mean they are not overlapping and we are fine to put to memcache. Of course developer should first call Get() within transaction but its the same requirement as for nds-less code.

So I still don’t see race here.

P.S.
Sorry for formatting, writing from phone.

@trakhimenok
Copy link
Author

By the way probably the tx.locItems should be a map by key rather a slice to handle tx autoretries gracefully.

I suspect that if the f() gets executed multiple times you can get duplicates in tx.lockitems and get unpredictable behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants