GC support #138

amsukdu · 2019-06-16T02:22:26Z

Added Gc class to sync.py
Mounted to ExpirableDictStorage and PersistentDictStorage
dict interface changed

Solves #115 issue.

Although the python collection blocks when GC-ing, I decided to go multithreading. Thread creation will be mutexed so only one thread will be created. I carefully tested this and will work fine.
DEBUG variable in Gc class is for debug log purpose. If you know the better way, please, let me know.
I saved the updating documentation work to more suitable person 😄
Quick question, is this strategy commonly used in CS? I am not a GC expert and only know a couple of strategies such as 'mark & sweep' etc. I was wondering that, based on your strategy, "live" entries can be removed instead of the "expired" ones. This seems quite unfair to me. I can make a PriorityQueue to remove expired ones first. Build time would be between O(n) or O(n*log(n)) so it won't be that harmful.

1. Added Gc class to sync.py 2. Mounted to ExpirableDictStorage and PersistentDictStorage 3. dict interface changed

youknowone · 2019-06-16T13:56:55Z

ring/func/asyncio.py

@@ -463,6 +463,7 @@ def dict(
            storage_class = fsync.PersistentDictStorage
        else:
            storage_class = fsync.ExpirableDictStorage
+        storage_class.maxsize = maxsize


This code looks like it changes the maxsize value of the storage class. It will cause side effect for other calls with different value.

I assumed
if storage_class is None:
means that we're in a 'default' mode so we're in the somewhat private zone... what about _maxsize?

The point is not about the naming rule. The next code will make problem.

# maxsize of f is 512 here because fsync.PersistentDictStorage.maxsize is 512 @ring.dict(maxsize=512) def f(): ... # maxsize of both f and g are 128 now becasue fsync.PersistentDictStorage is 128 @ring.dict(maxsize=128) def g(): ...

We shouldn't set any kind of class variable for decorator parameters.

I totally missed that. What a shame. I'll figure out a better way.

youknowone · 2019-06-16T13:59:52Z

ring/func/sync.py

+    def __init__(self, backend, target_size, expire_f=None):
+        assert round(target_size) > 0, 'target_size has to be at least 1'
+        assert expire_f is None or callable(expire_f), 'expire_f has to be function or None'
+        assert isinstance(backend, type({})), 'backend has to be dict-like'


abc.MutableMapping will be better than type({}). https://docs.python.org/3/library/collections.abc.html#collections.abc.MutableMapping

youknowone · 2019-06-16T14:04:51Z

ring/func/sync.py

+
+        self._mutex.acquire()
+        if (len(self._backend) > self._target_size):
+            WorkThread(self).start()


Do we take any advantage by using threads? I think gc for dict is cpu-bouded job and there is no benefits by threading. Tell me if I missed something.

Yes, you're right. The cache itself is in-memory so won't be any heavy I/O issue. However, If the dict entries are big enough, like 1000000, set_value(or in some other methods) can be blocked quite an amount of time. At least we can context switch if we go multithreading.
This is just my opinion and have no issues removing the threading.

Let's remove it. If the size matters, I think there are 2 reasons to regard threading is overkill.

If the backend requires IO job, it makes sense. But for dict, even 1000000 iteration is a microsecond problem. In most of cases with reasonable functions, it will be less than 1000000.

If we correctly add locking to it, gc blocks any kind of adding and removing from the dict - which mean the caller still should wait for gc job.

youknowone · 2019-06-16T14:05:39Z

ring/func/sync.py


 class ExpirableDictStorage(fbase.CommonMixinStorage, fbase.StorageMixin):
-
+    maxsize = 128


This value should not be bound to the class

Again, I can't 100% follow your thoughts with maxsize.
My guess is you're worrying that this value can be easily accessed & modified and my understanding is ExpirableDictStorage & PersistentDictStorage is our implement for default calling so we can control this value.

youknowone · 2019-06-16T14:06:46Z

ring/func/sync.py

@@ -229,6 +301,11 @@ def set_value(self, key, value, expire):
            expired_time = _now + expire
        self.backend[key] = expired_time, value

+        if self.maxsize < len(self.backend):
+            if self._gc == None:


We may check this part after solving threading design of gc

youknowone · 2019-06-16T14:14:43Z

Thanks for the contribution!
I remained a few comments and here are the responses of your questiosn.

I think lock is a good idea. But not sure about threading for gc run.
DEBUG is ok for now. Maybe we need to add a log module.
This is not a common GC in CS. Because GC is normally about dynamic-allocated memories, not about cache. Probably we need to rename this one to more proper name. The strategy itself is a variation of common one. (Usually random removing + linear scanning is used.) The requirements here is a little bit tricky because we want to expose the raw dict to user. This is not the best but statistically good enough. When users need better cache strategy, they will use @ring.lru rather than dict. So transparency is more important for dict implmenetation.

The CI failure is about formatting. You can check the result from https://travis-ci.org/youknowone/ring/builds/546264007?utm_source=github_status&utm_medium=notification

amsukdu · 2019-06-17T09:27:41Z

Thanks again for your kind & careful advice.
I totally agree with your response comment. It is more like a trimmer, not GC.
It would be great if we document it correctly, like

expired ones would be prioritized but ANY dict entry can be removed if it reaches to maxsize.

CI Failure was a fail of my own test. I'll correct it at next PR.

youknowone · 2019-06-17T16:13:35Z

Trimmer can be the name too. I don't know the proper name for this kind of job. So if you have any suggestion or find a correct name for it, please use your suggestion to the code. GC was just an expression in issues to make me sure not to be confused later.

amsukdu · 2019-06-23T08:36:27Z

@youknowone
I removed the threading and changed all of the comments you asked. Please, take a look!

amsukdu · 2019-06-25T09:48:55Z

I'm working on test_redis_hash fail.

youknowone · 2019-06-28T04:53:01Z

ring/func/base.py

@@ -638,7 +639,7 @@ class RingRope(RopeCore):
            def __init__(self, *args, **kwargs):
                super(RingRope, self).__init__(*args, **kwargs)
                self.user_interface = self.user_interface_class(self)
-                self.storage = self.storage_class(self, storage_backend)
+                self.storage = self.storage_class(self, storage_backend, maxsize)


Does redis_hash failure related to this change?

Yes, __init__ is overridden in RedisHashStorage. I'm looking at right now and seems there are some workarounds to get back to work but not sure which one will be the best one.
The problem is that BaseStorage has a init and ExpirableDictStorage & PersistentDictStorage adheres it but RedisHashStorage overrides it.

Can you tell me your opinion?

…rage __init__

amsukdu · 2019-07-04T13:02:35Z

@youknowone
I fixed the test fail on my own. I think this is quite reasonable. Please take a look.

youknowone · 2019-07-12T03:40:49Z

Sorry, i was busy for last week. i will check your recent change soon

GC support

9846986

1. Added Gc class to sync.py 2. Mounted to ExpirableDictStorage and PersistentDictStorage 3. dict interface changed

youknowone requested changes Jun 16, 2019

View reviewed changes

amsukdu force-pushed the feature-gc branch from 5fec255 to 8073bce Compare June 23, 2019 08:24

Removed threading & newly written test

bee4599

amsukdu force-pushed the feature-gc branch from 8073bce to bee4599 Compare June 23, 2019 08:25

youknowone reviewed Jun 28, 2019

View reviewed changes

Fixed test_redis_hash fail. Changed RedisHashStorage, AioredisHashSto…

0e2719b

…rage __init__

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GC support #138

GC support #138

amsukdu commented Jun 16, 2019 •

edited

youknowone Jun 16, 2019

amsukdu Jun 17, 2019

youknowone Jun 17, 2019

amsukdu Jun 18, 2019

youknowone Jun 16, 2019

amsukdu Jun 17, 2019

youknowone Jun 16, 2019

amsukdu Jun 17, 2019

youknowone Jun 17, 2019 •

edited

amsukdu Jun 18, 2019

youknowone Jun 16, 2019

amsukdu Jun 17, 2019

youknowone Jun 16, 2019

youknowone commented Jun 16, 2019 •

edited

amsukdu commented Jun 17, 2019

youknowone commented Jun 17, 2019

amsukdu commented Jun 23, 2019

amsukdu commented Jun 25, 2019

youknowone Jun 28, 2019

amsukdu Jun 30, 2019

amsukdu commented Jul 4, 2019

youknowone commented Jul 12, 2019


		class ExpirableDictStorage(fbase.CommonMixinStorage, fbase.StorageMixin):

		maxsize = 128

GC support #138

Are you sure you want to change the base?

GC support #138

Conversation

amsukdu commented Jun 16, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youknowone Jun 17, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youknowone commented Jun 16, 2019 • edited

amsukdu commented Jun 17, 2019

youknowone commented Jun 17, 2019

amsukdu commented Jun 23, 2019

amsukdu commented Jun 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amsukdu commented Jul 4, 2019

youknowone commented Jul 12, 2019

amsukdu commented Jun 16, 2019 •

edited

youknowone Jun 17, 2019 •

edited

youknowone commented Jun 16, 2019 •

edited