Support SpreadSASLMemcachedCache: sharding values > 1MB #32

reubano · 2013-09-21T12:36:11Z

Does this library support spreadsaslmemcachedcache? See http://pythonhosted.org/Flask-Cache/.

SpreadSASLMemcachedCache – spreadsaslmemcachedcache

Same as SASLMemcachedCache however, it has the ablity to spread value across multiple keys if it is bigger than the memcached treshold which by default is 1M. Uses pickle.

alevy · 2013-09-21T18:29:42Z

I mean... no. this would be very difficult (and probably not desirable) to do exactly since pickle is very Python specific. However, doing something similar, where values are broken up would be relatively straight forward (i've done it for Dalli). Not sure if this makes sense as a separate library/code snippet. If you get around to this I'll definitely appreciate a pull request. Otherwise, I might get around to it.

In general, there are some tricky tradeoffs with the kinds of strategies you might use to spread values across multiple keys, that I'm not sure how best to generalize (as opposed to giving code snippets for how to write a custom thing for each app, or a higher level library with multiple options or something). But very interested in seeing what can be done.

reubano · 2013-09-22T08:47:52Z

Can you post the code you used? The implementation doesnt matter (pickle vs something else) as long i can replicate the result.

Sent from my iPhone

On Sep 21, 2013, at 9:29 PM, Amit Levy notifications@github.com wrote:

I mean... no. this would be very difficult (and probably not desirable) to do exactly since pickle is very Python specific. However, doing something similar, where values are broken up would be relatively straight forward (i've done it for Dalli). Not sure if this makes sense as a separate library/code snippet. If you get around to this I'll definitely appreciate a pull request. Otherwise, I might get around to it.

In general, there are some tricky tradeoffs with the kinds of strategies you might use to spread values across multiple keys, that I'm not sure how best to generalize (as opposed to giving code snippets for how to write a custom thing for each app, or a higher level library with multiple options or something). But very interested in seeing what can be done.

—
Reply to this email directly or view it on GitHub.

reubano · 2013-11-08T07:31:30Z

Bump

reubano · 2014-09-12T05:43:23Z

Just curious why you closed this since the issue isn't solved.

alevy · 2014-09-12T17:26:19Z

Hrm... mostly I was just doing cleaning, and hadn't touched this issue in almost a year... but..

I wasn't convinced this is a good feature to have in this level of library since it would need to provide a specific choice amongst a set of imperfect semantics: what happens when one of the keys is evicted? do you store keys sequntially (i.e. mykey_1, mykey_2, etc) and read until you get a miss or have a base key that contains a list of subkeys? do you split up values automatically if it's too large to fit or do you have a separate command?

My hunch is that all of the answers are correct for some people and I'm not sure there is a "good" one for most.

However, I can mention what I've done in a certain app algorithmically:

set:

split data into chunks < 1MB
set base key to the number of chunks, retrieving the version of the base key (this is not supported in memjs, but could and should be)
Do a compare-and-swap with the version number from the base key on each of the subsequent keys. If it fails with an earlier version number, redo using the new version number. If it fails with a later version number, abort -- another thread has override your previous work already.

get:

Get the base key with version number
Get all referenced subkeys 0-n ensuring until the version doesn't match. If the non-matched version is higher, start over as someone has overriden the value.

reubano · 2014-09-12T21:24:36Z

Well, I'm far from an expert on this topic, but why not just do it how the python version does it? There has to be a nodejs equivalent of pickle, right? At the least, can you post code showing how you worked around this issue before?

reubano · 2014-09-12T21:30:25Z

And to answer one of the questions, you would select a spread mode upon instantiating memjs. In that mode, any values > max size would be split up automatically. Also, what's wrong with reading until you get a miss? Couldn't you just put it in a loop and place that in a try/except block?

alevy · 2014-09-12T21:43:43Z

pickle doesn't help much here, it's just a serialization format, and in any case, memjs only deals with values of type String or Buffer which are already serialized. So as far as splitting things up, that's just:

// largeValue is a `Buffer` to split, maxSize is the maximum size of any part 
function split(largeValue, maxSize) {
  var parts = []
  while(largeValue.length > 0) {
    parts.push(largeValue.slice(0, maxSize);
    largeValue = largeValue.slice(maxSize);
  }
  return parts;
}

Here is a gist of some ruby skeleton code for spreading this across the cache like this with Dalli: https://gist.github.com/alevy/5512907

alevy · 2014-09-12T21:51:41Z

The problem with just iterating until you get a miss is two-fold:

When you actually get a miss, how do you know if the miss is because you finished reading all the parts? Or because that sub-key was evicted in the cache? So you need some sort of terminator sequence, but then what if that sequence happens to exist in the value itself, so you need to escape it in the value, etc...
It's very slow to retrieve a long value -- you have to wait for a response from each subkey in order to proceed to the next one (you could mitigate this by batching, say, 5 at a time, but that sucks for values smaller than 5 sub-keys). Ideally, you'd like to be able to run most of the requests "quietly" such that your not waiting a round trip on each request until sending the next one. This is the difference between retrieving a large value in <10ms vs. many 100's of ms.

I think spreading values automatically is problematic because it breaks performance assumptions. I expect that when I call a memcache function on a low-level library, I get the performance of a single request (should be on the order 1ms). If all of a sudden I get a 2x or more performance hit because my value is slightly larger, that's less intuitive behavior (for a performance sensitive library) IMO than an error.

alevy · 2014-09-12T21:54:00Z

Having said all that, I think this discussion you've helped highlight some missing lower level features that would make doing this easier and would also be useful on their own (specifically having a way to get at the version of get and set response, as well as supporting compare-and-swap). I'll open new issues for those (and /cc you). I don't mind at all including a utility function (although probably in a different namespace) to split up values in a particular way -- we've basically already pseudo-coded the algorithm in this thread :).

reubano · 2014-10-01T21:47:28Z

I think you've misunderstood me. Spreading values wouldn't happen automatically. It would be an option in the setup, e.g., client = memjs.Client.create({spread: True}). By default, spread would be False.

alevy · 2014-10-02T03:30:02Z

Oh interesting. OK, i don't mind that as much... reopening

alevy closed this as completed Sep 11, 2014

alevy reopened this Oct 2, 2014

dterei changed the title ~~support spreadsaslmemcachedcache~~ Support SpreadSASLMemcachedCache: sharding values > 1MB Apr 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support SpreadSASLMemcachedCache: sharding values > 1MB #32

Support SpreadSASLMemcachedCache: sharding values > 1MB #32

reubano commented Sep 21, 2013

alevy commented Sep 21, 2013

reubano commented Sep 22, 2013

reubano commented Nov 8, 2013

reubano commented Sep 12, 2014

alevy commented Sep 12, 2014

reubano commented Sep 12, 2014

reubano commented Sep 12, 2014

alevy commented Sep 12, 2014

alevy commented Sep 12, 2014

alevy commented Sep 12, 2014

reubano commented Oct 1, 2014

alevy commented Oct 2, 2014

Support SpreadSASLMemcachedCache: sharding values > 1MB #32

Support SpreadSASLMemcachedCache: sharding values > 1MB #32

Comments

reubano commented Sep 21, 2013

SpreadSASLMemcachedCache – spreadsaslmemcachedcache

alevy commented Sep 21, 2013

reubano commented Sep 22, 2013

reubano commented Nov 8, 2013

reubano commented Sep 12, 2014

alevy commented Sep 12, 2014

reubano commented Sep 12, 2014

reubano commented Sep 12, 2014

alevy commented Sep 12, 2014

alevy commented Sep 12, 2014

alevy commented Sep 12, 2014

reubano commented Oct 1, 2014

alevy commented Oct 2, 2014