Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

some operations lock the index too long #608

Open
Dieterbe opened this issue Apr 17, 2017 · 4 comments
Open

some operations lock the index too long #608

Dieterbe opened this issue Apr 17, 2017 · 4 comments

Comments

@Dieterbe
Copy link
Contributor

certain operations have no upper bound on how long they can lock the index, which may stall ingest and delay responding to http requests:

  • pruning
  • getting a list of all metrics
  • expensive find/delete operations.

see also #514

#606 probably fixed the lowest hanging fruit.
this may not be such high prio anymore, but we should still at least benchmark these operations and see if/when it becomes worth tackling.

@replay
Copy link
Contributor

replay commented Apr 17, 2017

would it make sense to add a metric that takes the time right before .Rlock()/.Lock() calls and after them? Then we could record how long the code waited for the lock. Or do you think this would cause too much additional overhead because those locks are acquired very often?
As a metric it would certainly be very interesting to see how long we're waiting to acquire the index lock

@replay replay closed this as completed Apr 17, 2017
@replay replay reopened this Apr 17, 2017
@Dieterbe
Copy link
Contributor Author

we already measure how long index operations take. i think lock-hold times is too low-level to be consistently monitored at runtime. I think that's just something to look into on as as-needed basis, probably in a dev environment.

@Dieterbe
Copy link
Contributor Author

slow pruning is starting to affect some of our largest customers.
groupByNode(consolidateBy(metrictank.stats.$environment.$instance.idx.*.prune.latency.{p90,max}.gauge32, 'max'), 8, 'maxSeries') shows >10s prune durations
this blocks ingestion and queries. and we need to optimize this.
when i got some stacktraces, the only active lines were the lines 508 and 512 of idx/memory/memory.go
(version 41d6eaa) which are the 2 log.Debug statements below

		if len(bNode.Children) > 1 {
			newChildren := make([]string, 0, len(bNode.Children)-1)
			for _, child := range bNode.Children {
				if child != nodes[i] {
					newChildren = append(newChildren, child)
				} else {
					log.Debug("memory-idx: %s removed from children list of branch %s", child, bNode.Path)
				}
			}
			bNode.Children = newChildren
			log.Debug("memory-idx: branch %s has other children. Leaving it in place", bNode.Path)
			// no need to delete any parents as they are needed by this node and its
			// remaining children
			break
		}

so i expect optimizing those log.Debug statements should make this a multiple times faster already

@DanCech
Copy link
Contributor

DanCech commented Dec 18, 2017

See #787

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants