Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

do not allow indefinite write lock during deletes #1897

Closed
wants to merge 2 commits into from

Conversation

robert-milan
Copy link
Contributor

@robert-milan robert-milan commented Sep 1, 2020

Changes the behavior of delete operations to prevent them from completely locking up the index for extended periods of time

I initially tried getting all the leaves and deleting one by one, but this proved to be far too slow. I have removed that code but you can see in the commits.

Will this be enough to alleviate the issues of locking the index for too long though?

Fixes: #1885

for _, node := range found {
tl.Wait()
lockStart := time.Now()
bc = m.Lock()
Copy link
Collaborator

@shanson7 shanson7 Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding something here, but it seems like the write lock is being acquired and released in each iteration. Perhaps it should only be released when it has used up its time slice?

Repeatedly acquiring write locks is very expensive in the memory index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent here is to hold write locks for as short a period as possible. Additionally, the duration reported to the TimeLimiter for how long the lock has been held includes the time spent waiting for all existing RLocks to be released. In general, threads will be blocked for MAX(RLock duration) + Lock duration.
If all RLocks are held for a very short amount of time, then this current approach will work well. But as @shanson7
points out, if the RLocks are held for long periods then trying to acquire lots of Locks, no matter how fast they are, is going to result in low throughput due to threads spending all their time being blocked.

I think we need both approaches here. We still need to rate limit the how long we are blocking reads for, but we should also perform deletes in batches to reduce the number of locks needed. We dont want a single write lock to be held for the full 200ms, but holding it for 5ms or less would be fine. 5ms is a really long time, and most delete operations will complete in this time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still experimenting with deleting by leaf. I don't think this current approach solves the entire problem set. If someone tries to delete toplevel.* when they have a lot of child nodes it will still lock up the index since the current method would recursively call all the way down the tree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, deleting * would be even worse.

We definitely need the delete call to find all leaf nodes to be deleted while only holding read locks. Then delete these in batches with write locks.

@stale
Copy link

stale bot commented Jan 26, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 26, 2021
@stale stale bot closed this Feb 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dont hold write lock for entire duration of index delete calls
4 participants