Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Expiration map grows without limit #329

Open
nemorales opened this issue Feb 3, 2023 · 7 comments
Open

[BUG]: Expiration map grows without limit #329

nemorales opened this issue Feb 3, 2023 · 7 comments
Labels
kind/bug Something is broken.

Comments

@nemorales
Copy link

What version of Ristretto are you using?

v0.1.1

What version of Go are you using?

1.20

Have you tried reproducing the issue with the latest release?

Yes

What is the hardware spec (RAM, CPU, OS)?

16GB RAM, Apple M1 Pro , macOS Monterey 12.5

What steps will reproduce the bug?

We encountered problems with memory usage on production, so I coded a local application and stressed it. Configuration of ristretto:

ristretto.Config{NumCounters: 1000000,MaxCost: int64(100 * bytesInMb),BufferItems: 64,Metrics:true}

We used TTL of an hour (3600* time.Second) and stressed the application to generate 1k items per 10ms, an item uses 200 bytes aprox.

After reaching eviction point, we leave the application stresser running for 10 more min. Using pprof to analyze in use space we see this:

imagen

Expected behavior and actual result.

I would expect the keys added for expiration in the expiration map to be removed once the same key gets evicted.

What it seems to happen is that the key is not removed from the expiration map after eviction, so when using a big enough TTL the memory usage of ristretto knows no limit.

Additional information

No response

@nemorales nemorales added the kind/bug Something is broken. label Feb 3, 2023
@Woutifier
Copy link

Hey @nemorales it seems we are encountering this as well. Did you run a garbage collection prior to running pprof? (I'm not 100% sure that it happens automatically).

Looking at the code there a chance for the expiration bucket cleanup to be skipped under load. There is a single goroutine responsible for a number of things, among which cleaning up the "just expired bucket". A timer ticker has a buffer of one, so if the cleanup is not happening fast enough (e.g. due to some type of lock contention, or a heavy OnEvict function), the ticker will skip. This can cause a bucket to not be cleaned up, and it will remain in place until the parent cache is GC'ed.

Unrelated (in terms of cause), but roughly in the same code: clearing the cache also doesn't currently clear the expiryMap, which seems to be an oversight. as it makes no sense to keep it around if the items themselves have been cleared.

Replication is fairly easy by just adding a time.Sleep of some duration (I used 1 millisecond) as the OnEvict function, which will cause the cleaning up of buckets to be too slow to keep up under any kind of load, triggering the behavior.

@gramidt
Copy link

gramidt commented May 17, 2023

@nemorales @Woutifier - Have you continued to experience this behavior? I'm seeing something similar and was curious if either of you found a solution.

@Woutifier
Copy link

Woutifier commented May 22, 2023

I'm thinking of these two changes: https://github.com/Woutifier/ristretto/commits/wouter/clear-expiry

Since then, I've found another issue, which is that if you don't set a TTL at all (Just Set, not SetWithTTL), and the item you're setting is already in the cache, it will use the "Update" path. Update doesn't properly filter out items without expiration, and just creates a new "zero time" bucket and adds the item into that.

Edit: added a fix for the new issue to the branch above too.

@nemorales
Copy link
Author

@Woutifier that's an interesting finding. Now I understand why ristretto has different memory usage behavior depending on the load: in my use case we decided not to use TTL but there are some cases in which memory usage grows without limit.

I'm taking a look at your commits, thanks for sharing what you learned!

@mjoyce91
Copy link

mjoyce91 commented Aug 5, 2023

Is there any new information on this issue?

@Woutifier
Copy link

I've opened a PR related to this issue: #358

@makifbaysal
Copy link

I have a similar issue with TTL. Memory usage is okay when I set the same key in the cache with a lower TTL(like 1 hour). But if I increased the TTL to 1 day, memory usage exceeded 1 GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken.
Projects
None yet
Development

No branches or pull requests

5 participants