Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exposing a region's configuration/lock for custom functionality #125

Open
sqlalchemy-bot opened this issue Jul 3, 2018 · 10 comments
Open

Comments

@sqlalchemy-bot
Copy link

Migrated issue, originally created by jvanasco (jvanasco)

with some advanced Redis usage, I need to handle caching certain keys in ways that are incompatible with dogpile.cache as-is, and must use the dogpile.core lock directly. at the same time, I still want to leverage my dogpile region and it's configuration.

by "incompatible", i mean various mechanisms that can't be fixed with a custom backend or Proxy. In one example, a key corresponds to a redis hash with messages for a user, and contains at least 3 values (count unread, count read, cached_ids):

messages|{user_id} count_read  = 10
messages|{user_id} count_unread = 5
messages|{user_id} cached_ids = (1,2,3,4,5,6,7,8,9,10)
messages|{user_id} msg-1 = {foo: bar}
messages|{user_id} msg-2 = {foo: bar}
messages|{user_id} msg-3 = {foo: bar}

getting/updating the message counts requires writing/validating two keys at once (count_read, count_unread). various functions need to happen within a redis pipeline as well.

to handle this sort of operation, I am essentially leveraging the code below - which is based on the getcreate logic

would it make sense to integrate something like this into the CacheRegion itself?

#!python

    import logging
    log = logging.getLogger(__name__)


    from dogpile import Lock, NeedRegenerationException
    from dogpile.cache.api import NO_VALUE


def raw_getcreate(region, key, f_get, f_gen, f_async=None):
    # key is needed for the lock

	def _get_value():
		if __debug__:
			log.debug("raw_getcreate._get_value")
		value = f_get()
		if value in (None, NO_VALUE):
			raise NeedRegenerationException()
		return value, -1

	def _gen_value():
		if __debug__:
			log.debug("raw_getcreate._gen_value")
		try:
			if __debug__:
				log.debug("raw_getcreate._gen_value | first, try fetching")
			return _get_value()
		except NeedRegenerationException as e:
			pass
		if __debug__:
			log.debug("raw_getcreate._gen_value | create")
		created_value = f_gen()
		return created_value, -1
			
	with Lock(
			region._mutex(key),
			_gen_value,
			_get_value,
			None,  # expiration_time, never expire
			f_async,  # async creator, unused
	) as value:
		return value


reg = CACHE_REGIONS['foo']
key = "testkey-2"

def f_get():
	value = reg.backend_actual.get(key)
	return value

def f_gen():
	created_value = "NEWDATA"
	uploaded = reg.backend_actual.set(key, created_value)
	return created_value

print raw_getcreate(reg, key, f_get, f_gen)
@sqlalchemy-bot
Copy link
Author

Changes by jvanasco (jvanasco):

  • edited description

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

are you still identifying all those things by one "key" at the top? if there are still "get" and "set" operations of some kind, why cant the custom backend look up all keys, combine them together, call wahteevr, etc.? the CacheRegion is already oriented around having all the real options dependency-injected.

@sqlalchemy-bot
Copy link
Author

jvanasco (jvanasco) wrote:

This type of functionality doesn't really work well as a method of a custom region or backend once you scale the amount of regions beyond a certain point. I'm not sure what that point is, but one deployment has 23 regions using 3 backends right now; implementing the raw lock access for custom getcreates into custom regions would just be a nightmare to manage.

in typical usage, i've got about 5 unique methods/keys which will share a single key for locking. in the example above, editing any hashfield under messages|{user_id} will lock the key messages|{user_id}. [this is newly a hash, an earlier version used flat keys like messages|{user_id}|ids but I needed to squeeze some more performance out of redis].

this is definitely an advanced case and I don't need a solution for myself - I have a functional workaround.

the getcreate logic in dogpile's region and lock is really brilliant, but it seems limited in use because it's not pluggable.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

so the problem is the number of regions and that's why you need to pass the functions in at get() time? clarifying the region is fully pluggable, just not on a per-get basis? or is there a new kind of functionality that cant be done here period with a plugin. I don't understand what the number of regions has to do with anything.

@sqlalchemy-bot
Copy link
Author

jvanasco (jvanasco) wrote:

tldr: the barrier to simply using a custom backend is the need to use different keys for the Lock template and Cached values.

or is there a new kind of functionality that cant be done here period with a plugin.

Mostly this.

I don't understand what the number of regions has to do with anything.

In a given region, a getcreate on 5 different backend keys [Key1, Key2...Key5] needs to create a lock on the same key KeyA.

Option 1: use 5 identically configured regions, which getcreate on KeyA to make the lock, and use 5 different custom backends to handle the get/set

Option 2: use 5 custom regions, which getcreate on Key1-Key5 but share a lock on KeyA, and use 1 standard backend.

Because the backends handle plugability of the underlying datastore, it would make more sense to implement this with a custom CacheRegion.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

I dont understand yet why the custom backend can't do all that work, delegating to a real backend to get to the cache itself. your backend gets the key "messages|user_id" and then does waht it needs to get all those things.

is it because the individual getcreate() has to call into additional independent getcreate() steps? e.g. one lock operation has to actually involve more dogpile locks? The region implementation can call back to the owning cache region itself if it wanted to, or into another one.

put another way, the API you ask for is:

region.get(key, f_get, f_gen)

can't possibly work if it was:

region = CacheRegion(f_get=f_get)
region.get(key, f_gen=f_gen)

right? That's essentially what this seems to ask

@sqlalchemy-bot
Copy link
Author

jvanasco (jvanasco) wrote:

It's because multiple backend getcreates need to map onto a single shared key when locking:

value = reg.getcreate("key1")
value = reg.getcreate("key2")
value = reg.getcreate("key3")

On a cache miss, these all need to use the same getcreate lock as KeyA

 - with Lock(key="key1", get("key1"), fgen("key1")) as value:
 + with Lock(key="KeyA", get("key1"), fgen("key1")) as value:

 - with Lock(key="key2", get("key2"), fgen("key2")) as value:
 + with Lock(key="KeyA", get("key2"), fgen("key2")) as value:

 - with Lock(key="key3", get("key3"), fgen("key3")) as value:
 + with Lock(key="KeyA", get("key3"), fgen("key3")) as value:

Within CacheRegion.get_or_create, the get_value, gen_value and the lock's _mutex all use the same key.

Does this make sense? It's not an issue with get or set, but being able to use a specific string for the lock on getcreate

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

OK, you need the locks to share a key. that can be passed in, "lock_key", very easy. want to do that?

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

I mean, basically the answer is no I really don't want to blow up the existing get_or_create as far as its structure, it's complicated enough, to explode that structure even more would mean we really have to have the use case nailed down very clearly becasue somoene else might have another use case we aren't hitting.

@sqlalchemy-bot
Copy link
Author

jvanasco (jvanasco) wrote:

I'm willing to PR a custom lock_key override, or anything else, if it seems like a good idea for the general public.

Maybe just leave this ticket open for a while and see if any other use-cases bubble up first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant