Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

dont hold write lock for entire duration of index delete calls #1885

Open
woodsaj opened this issue Aug 21, 2020 · 1 comment
Open

dont hold write lock for entire duration of index delete calls #1885

woodsaj opened this issue Aug 21, 2020 · 1 comment

Comments

@woodsaj
Copy link
Member

woodsaj commented Aug 21, 2020

When an index Delete is called, the entire Delete operation is performed while a write lock is held. While the lock is held, no metrics can be ingested and no other index searches can run.

To avoid problems , the Delete function https://github.com/grafana/metrictank/blob/master/idx/memory/memory.go#L1492 should instead

  1. use a read lock and perform a find() to get all matching nodes, then release the read lock.
  2. for each matching node, acquire a read lock, then traverse the tree to find all matching leaf nodes, then release the read lock.
  3. for each matching leaf node, acquire a write lock, then delete the leaf node, which will cascade down the tree and clean up any branches that have no children.

eg something like

	var deletedDefs []idx.Archive
	pre := time.Now()
	defer func() {
		if len(deletedDefs) == 0 {
			return
		}
		if m.findCache == nil {
			return
		}
		// asynchronously invalidate any findCache entries
		// that match any of the deleted series.
		if len(deletedDefs) > findCacheInvalidateQueueSize {
			m.findCache.Purge(orgId)
		} else {
			for _, d := range deletedDefs {
				m.findCache.InvalidateFor(orgId, d.NameWithTags())
			}
		}
	}()
	m.RLock()
	tree, ok := m.tree[orgId]
	if !ok {
                m.RUnlock()
		return nil, nil
	}

	found, err := find(tree, pattern)
        m.RUnlock()
	if err != nil {
		return nil, err
	}
        leafNodes := make([]*Node, 0)
	for _, node := range found {
                m.RLock()
                // GetLeaves would be some recursive function like https://github.com/grafana/metrictank/blob/master/idx/memory/memory.go#L1537-L1552
		leafNodes = append(leafNodes, node.GetLeaves()) 
                m.RUnlock()
	}
        for _, node := range leafNodes {
                m.Lock() // should probably use the same lock rate limiting that is used in Prune
                deleted := m.delete(orgId, node, true, false)
		deletedDefs = append(deletedDefs, deleted...)
                m.Unlock()
        }
	statMetricsActive.DecUint32(uint32(len(deletedDefs)))
	statDeleteDuration.Value(time.Since(pre))

	return deletedDefs, nil
}
@woodsaj woodsaj added the bug label Aug 21, 2020
@Dieterbe Dieterbe added this to the sprint-16 milestone Aug 24, 2020
@robert-milan robert-milan self-assigned this Sep 1, 2020
@fkaleo fkaleo modified the milestones: sprint-16, sprint-17 Sep 21, 2020
@Dieterbe Dieterbe modified the milestones: sprint-17, sprint-18 Oct 28, 2020
@fkaleo
Copy link
Contributor

fkaleo commented Oct 28, 2020

#1897 might fix it, if not @robert-milan has another branch that should and will do a PR for

@fkaleo fkaleo self-assigned this Oct 29, 2020
@fkaleo fkaleo removed this from the sprint-18 milestone Nov 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants