Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling Object TTL #60

Open
martinsumner opened this issue Mar 15, 2019 · 1 comment
Open

Handling Object TTL #60

martinsumner opened this issue Mar 15, 2019 · 1 comment

Comments

@martinsumner
Copy link
Owner

Currently the behaviour of objects with a time to live (TTL) is not currently tested.

The primary challenge is when to recognise the expiry of the object in the cached tree. The expiry of the object has no external event trigger, so the aae_controller won't see anything to prompt the cache to represent changes as a result of expiry.

This will be OK at first, as other caches which are being co-ordinated with, will similarly continue to represent the state of expired objects. However, at some stage one of the controllers will go through a rebuild of the tree - and suddenly the trees will be divergent to the extent of the volume of objects expired between rebuilds.

This will gradually repair, as every time a delta is recognised, fetch_clocks will be run and fetch_clocks will reset the cache entry based on the expiry time of the object. So after many exchanges (assuming a lot of objects were expired), the cached trees will be back in sync.

However, if there are o(1m) expired objects that impact the tree, then AAE will be rendered useless by the weight of false positives for a considerable period (maybe many days). This is unsatisfactory.

There are three options:

  • Co-ordinate rebuilds across controllers;
  • Dynamically schedule rebuilds in response to upticks in exchange-prompted segment repairs;
  • Co-ordinate object expiry across the cluster (i.e. have managed by a sweeper event).

Currently, the second approach appears to the best option in turns of simplicity. Track how many replace_dirtysegments messages in the tree_cache are received, and prompt rebuild on accumulating a threshold of such events. That threshold should decrease as the time since the last rebuild increases to stop a second rebuild being driven by false positives (from exchanging with unbuilt trees) after a first rebuild - or this problem could be addressed by counting only segment replacements that were necessary (i.e. read before replace).

@Konstantin74R
Copy link

Is there any news?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants