Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: 24.1 microbenchmark regressions #123236

Closed
4 tasks done
jbowens opened this issue Apr 29, 2024 · 7 comments
Closed
4 tasks done

storage: 24.1 microbenchmark regressions #123236

jbowens opened this issue Apr 29, 2024 · 7 comments
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-storage Storage Team
Projects

Comments

@jbowens
Copy link
Collaborator

jbowens commented Apr 29, 2024

  • MVCCBlindPut_Pebble/valueSize=10000
  • MVCCIncrementalIterator/ts=480
  • EncodeMVCCValue/header=empty/
  • MVCCPut_Pebble/batch=false/valueSize=1000/versions=10

https://docs.google.com/spreadsheets/d/1-Odc6JfiseZ_BHGU0_L3K-w1AI7coNukap2_wTUhF3A/edit#gid=5

Jira issue: CRDB-38253

@jbowens jbowens added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Apr 29, 2024
@jbowens jbowens added this to Incoming in Storage via automation Apr 29, 2024
@jbowens
Copy link
Collaborator Author

jbowens commented Apr 29, 2024

Looking at allocations for MVCCBlindPut_Pebble/valueSize=10000:

Showing nodes accounting for 774048, 94.13% of 822361 total
Dropped 170 nodes (cum <= 4111)
Showing top 20 nodes out of 157
      flat  flat%   sum%        cum   cum%
    327687 39.85% 39.85%     327687 39.85%  github.com/cockroachdb/cockroach/pkg/storage.EncodeMVCCKey
    131074 15.94% 55.79%     132026 16.05%  github.com/cockroachdb/pebble.(*DB).commitWrite
     65536  7.97% 63.76%      65536  7.97%  path.Join
     32768  3.98% 67.74%      32768  3.98%  github.com/cockroachdb/redact/internal/redact.Safe
     32768  3.98% 71.72%      32768  3.98%  github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo.func22
     21905  2.66% 74.39%      21905  2.66%  github.com/cockroachdb/pebble/vfs.(*memFile).Write
     21845  2.66% 77.04%      21845  2.66%  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.init.1.func1
     16384  1.99% 79.04%      72754  8.85%  github.com/cockroachdb/pebble.(*tableCacheShard).findNodeInternal
     16384  1.99% 81.03%      16384  1.99%  github.com/cockroachdb/pebble/sstable.(*Properties).saveString

The EncodeMVCCKey allocations:

         .          .    186:func EncodeMVCCKey(key MVCCKey) []byte {
         .          .    187:	keyLen := encodedMVCCKeyLength(key)
    327687     327687    188:	buf := make([]byte, keyLen)
         .          .    189:	encodeMVCCKeyToBuf(buf, key, keyLen)
         .          .    190:	return buf
         .          .    191:}

The commitWrite allocations:

         .          .    952:
         .          .    953:	if b.flushable == nil {
    131074     131153    954:		size, err = d.mu.log.writer.WriteRecord(repr, wal.SyncOptions{Done: syncWG, Err: syncErr}, b.refData)
         .          .    955:		if err != nil {
         .          .    956:			panic(err)
         .          .    957:		}
         .          .    958:	}

Looks like the new ones are the commitWrite allocations. Some digging makes it look like it's b.refData that's escaping to the heap

jbowens added a commit to jbowens/pebble that referenced this issue Apr 29, 2024
Passing in the Batch.refData func into (wal.Writer).WriteRecord forced an
allocation for every committed batch. This commit defines a new wal.RefCount
interface implemented by Batch to avoid the allocation.

Informs cockroachdb/cockroach#123236.
jbowens added a commit to jbowens/pebble that referenced this issue Apr 30, 2024
Passing in the Batch.refData func into (wal.Writer).WriteRecord forced an
allocation for every committed batch. This commit defines a new wal.RefCount
interface implemented by Batch to avoid the allocation.

Informs cockroachdb/cockroach#123236.
jbowens added a commit to jbowens/pebble that referenced this issue Apr 30, 2024
Passing in the Batch.refData func into (wal.Writer).WriteRecord forced an
allocation for every committed batch. This commit defines a new wal.RefCount
interface implemented by Batch to avoid the allocation.

Informs cockroachdb/cockroach#123236.
jbowens added a commit to cockroachdb/pebble that referenced this issue Apr 30, 2024
Passing in the Batch.refData func into (wal.Writer).WriteRecord forced an
allocation for every committed batch. This commit defines a new wal.RefCount
interface implemented by Batch to avoid the allocation.

Informs cockroachdb/cockroach#123236.
jbowens added a commit to cockroachdb/pebble that referenced this issue Apr 30, 2024
Passing in the Batch.refData func into (wal.Writer).WriteRecord forced an
allocation for every committed batch. This commit defines a new wal.RefCount
interface implemented by Batch to avoid the allocation.

Informs cockroachdb/cockroach#123236.
@jbowens jbowens moved this from Incoming to In Progress (this milestone) in Storage Apr 30, 2024
@jbowens jbowens self-assigned this Apr 30, 2024
@jbowens
Copy link
Collaborator Author

jbowens commented Apr 30, 2024

Looking at MVCCIncrementalIterator/ts=480 alloc_objects:

23.2:

      flat  flat%   sum%        cum   cum%
    120149  9.40%  9.40%     120149  9.40%  github.com/cockroachdb/pebble.(*mergingIter).init
     98304  7.69% 17.10%      98304  7.69%  github.com/cockroachdb/pebble/internal/manifest.versionEditDecoder.readUvarint
     87382  6.84% 23.93%      87382  6.84%  github.com/cockroachdb/cockroach/pkg/storage.(*pebbleIterator).setOptions
     65538  5.13% 29.06%      65538  5.13%  github.com/cockroachdb/pebble/sstable.(*blockIter).readEntry
     65537  5.13% 34.19%      65537  5.13%  github.com/cockroachdb/cockroach/pkg/storage.glob..func17
     65536  5.13% 39.32%      65536  5.13%  github.com/cockroachdb/cockroach/pkg/keys.LockTableSingleKey
     65536  5.13% 44.45%      65536  5.13%  github.com/cockroachdb/cockroach/pkg/storage.EngineKey.EncodeToBuf
     50737  3.97% 48.42%      50737  3.97%  github.com/cockroachdb/cockroach/pkg/roachpb.(*Value).ensureRawBytes
     49839  3.90% 52.32%      49839  3.90%  github.com/cockroachdb/pebble.glob..func4
     49152  3.85% 56.17%      49152  3.85%  github.com/cockroachdb/pebble.(*Iterator).findNextEntry

24.1:

      flat  flat%   sum%        cum   cum%
    163844 15.82% 15.82%     300534 29.01%  github.com/cockroachdb/pebble.(*tableCacheShard).newPointIter
    114713 11.07% 26.89%     114713 11.07%  github.com/cockroachdb/pebble/sstable.(*singleLevelIterator).init
     65536  6.33% 33.22%      65536  6.33%  github.com/cockroachdb/pebble/internal/manifest.versionEditDecoder.readUvarint
     43691  4.22% 37.44%      76459  7.38%  github.com/cockroachdb/cockroach/pkg/storage.(*pebbleIterator).setOptions
     42980  4.15% 41.59%      42980  4.15%  github.com/cockroachdb/pebble/sstable.(*valueBlockWriter).addValue
     32768  3.16% 44.75%      32768  3.16%  github.com/cockroachdb/cockroach/pkg/storage.(*intentInterleavingIter).makeUpperLimitKey
     32768  3.16% 47.91%      32768  3.16%  github.com/cockroachdb/cockroach/pkg/storage.EncodeMVCCKeyToBuf
     32768  3.16% 51.08%      32768  3.16%  github.com/cockroachdb/cockroach/pkg/storage.encodeMVCCTimestamp
     32768  3.16% 54.24%     256997 24.81%  github.com/cockroachdb/pebble.(*Iterator).SeekGEWithLimit
     32768  3.16% 57.41%      32768  3.16%  github.com/cockroachdb/pebble/internal/compact.LimitFuncSplitter

Is this because this benchmark creates a lot of obsolete versions and now began to default to using value blocks, where previously it did not?

@jbowens
Copy link
Collaborator Author

jbowens commented Apr 30, 2024

ROUTINE ======================== github.com/cockroachdb/pebble.(*tableCacheShard).newPointIter in table_cache.go
    163844     300534 (flat, cum) 29.01% of Total
         .          .    552:func (c *tableCacheShard) newPointIter(
         .          .    553:	ctx context.Context,
         .          .    554:	v *tableCacheValue,
         .          .    555:	file *manifest.FileMetadata,
         .          .    556:	cr sstable.CommonReader,
         .          .    557:	opts *IterOptions,
         .          .    558:	internalOpts internalIterOpts,
         .          .    559:	dbOpts *tableCacheOpts,
         .          .    560:) (internalIterator, error) {
         .          .    561:	var (
         .          .    562:		hideObsoletePoints bool = false
         .          .    563:		pointKeyFilters    []BlockPropertyFilter
         .          .    564:		filterer           *sstable.BlockPropertiesFilterer
         .          .    565:	)
         .          .    566:	if opts != nil {
         .          .    567:		// This code is appending (at most one filter) in-place to
         .          .    568:		// opts.PointKeyFilters even though the slice is shared for iterators in
         .          .    569:		// the same iterator tree. This is acceptable since all the following
         .          .    570:		// properties are true:
         .          .    571:		// - The iterator tree is single threaded, so the shared backing for the
         .          .    572:		//   slice is being mutated in a single threaded manner.
         .          .    573:		// - Each shallow copy of the slice has its own notion of length.
         .          .    574:		// - The appended element is always the obsoleteKeyBlockPropertyFilter
         .          .    575:		//   struct, which is stateless, so overwriting that struct when creating
         .          .    576:		//   one sstable iterator is harmless to other sstable iterators that are
         .          .    577:		//   relying on that struct.
         .          .    578:		//
         .          .    579:		// An alternative would be to have different slices for different sstable
         .          .    580:		// iterators, but that requires more work to avoid allocations.
         .          .    581:		//
         .          .    582:		// TODO(bilal): for compaction reads of foreign sstables, we do hide
         .          .    583:		// obsolete points (see sstable.Reader.newCompactionIter) but we don't
         .          .    584:		// apply the obsolete block property filter. We could optimize this by
         .          .    585:		// applying the filter.
         .          .    586:		hideObsoletePoints, pointKeyFilters =
         .          .    587:			v.reader.TryAddBlockPropertyFilterForHideObsoletePoints(
         .          .    588:				opts.snapshotForHideObsoletePoints, file.LargestSeqNum, opts.PointKeyFilters)
         .          .    589:
         .          .    590:		var ok bool
         .          .    591:		var err error
         .      16385    592:		ok, filterer, err = c.checkAndIntersectFilters(v, opts.TableFilter,
         .          .    593:			pointKeyFilters, internalOpts.boundLimitedFilter, file.SyntheticSuffix)
         .          .    594:		if err != nil {
         .          .    595:			return nil, err
         .          .    596:		} else if !ok {
         .          .    597:			// Return an empty iterator. This iterator has no mutable state, so
         .          .    598:			// using a singleton is fine. We must return `filteredAll` instead
         .          .    599:			// of nil so that the returned iterator returns MaybeFilteredKeys()
         .          .    600:			// = true.
         .          .    601:			//
         .          .    602:			// TODO(jackson): This `filteredAll` subtlety can be removed after the
         .          .    603:			// planned #2863 refactor, when there will be no need to return a special
         .          .    604:			// empty iterator type to signify that point keys were filtered.
         .          .    605:			return filteredAll, err
         .          .    606:		}
         .          .    607:	}
         .          .    608:
         .          .    609:	var iter sstable.Iterator
         .          .    610:	useFilter := true
         .          .    611:	if opts != nil {
         .          .    612:		useFilter = manifest.LevelToInt(opts.level) != 6 || opts.UseL6Filters
         .          .    613:		ctx = objiotracing.WithLevel(ctx, manifest.LevelToInt(opts.level))
         .          .    614:	}
         .          .    615:	tableFormat, err := v.reader.TableFormat()
         .          .    616:	if err != nil {
         .          .    617:		return nil, err
         .          .    618:	}
         .          .    619:	var rp sstable.ReaderProvider
         .          .    620:	if tableFormat >= sstable.TableFormatPebblev3 && v.reader.Properties.NumValueBlocks > 0 {
    163844     163844    621:		rp = &tableCacheShardReaderProvider{c: c, file: file, dbOpts: dbOpts}
         .          .    622:	}
         .          .    623:
         .          .    624:	if v.isShared && file.SyntheticSeqNum() != 0 {
         .          .    625:		if tableFormat < sstable.TableFormatPebblev4 {
         .          .    626:			return nil, errors.New("pebble: shared ingested sstable has a lower table format than expected")
         .          .    627:		}
         .          .    628:		// The table is shared and ingested.
         .          .    629:		hideObsoletePoints = true
         .          .    630:	}
         .          .    631:	transforms := file.IterTransforms()
         .          .    632:	transforms.HideObsoletePoints = hideObsoletePoints
         .          .    633:	var categoryAndQoS sstable.CategoryAndQoS
         .          .    634:	if opts != nil {
         .          .    635:		categoryAndQoS = opts.CategoryAndQoS
         .          .    636:	}
         .          .    637:	if internalOpts.bytesIterated != nil {
         .       1820    638:		iter, err = cr.NewCompactionIter(
         .          .    639:			transforms, internalOpts.bytesIterated, categoryAndQoS, dbOpts.sstStatsCollector, rp,
         .          .    640:			internalOpts.bufferPool)
         .          .    641:	} else {
         .     118485    642:		iter, err = cr.NewIterWithBlockPropertyFiltersAndContextEtc(
         .          .    643:			ctx, transforms, opts.GetLowerBound(), opts.GetUpperBound(), filterer, useFilter,
         .          .    644:			internalOpts.stats, categoryAndQoS, dbOpts.sstStatsCollector, rp)
         .          .    645:	}
         .          .    646:	if err != nil {
         .          .    647:		return nil, err
         
ROUTINE ======================== github.com/cockroachdb/pebble/sstable.(*singleLevelIterator).init in reader_iter_single_lvl.go
    114713     114713 (flat, cum) 11.07% of Total
         .          .    184:func (i *singleLevelIterator) init(
         .          .    185:	ctx context.Context,
         .          .    186:	r *Reader,
         .          .    187:	v *virtualState,
         .          .    188:	transforms IterTransforms,
         .          .    189:	lower, upper []byte,
         .          .    190:	filterer *BlockPropertiesFilterer,
         .          .    191:	useFilter bool,
         .          .    192:	stats *base.InternalIteratorStats,
         .          .    193:	categoryAndQoS CategoryAndQoS,
         .          .    194:	statsCollector *CategoryStatsCollector,
         .          .    195:	rp ReaderProvider,
         .          .    196:	bufferPool *BufferPool,
         .          .    197:) error {
         .          .    198:	if r.err != nil {
         .          .    199:		return r.err
         .          .    200:	}
         .          .    201:	i.iterStats.init(categoryAndQoS, statsCollector)
         .          .    202:	indexH, err := r.readIndex(ctx, stats, &i.iterStats)
         .          .    203:	if err != nil {
         .          .    204:		return err
         .          .    205:	}
         .          .    206:	if v != nil {
         .          .    207:		i.vState = v
         .          .    208:		i.endKeyInclusive, lower, upper = v.constrainBounds(lower, upper, false /* endInclusive */)
         .          .    209:	}
         .          .    210:
         .          .    211:	i.inPool = false
         .          .    212:	i.ctx = ctx
         .          .    213:	i.lower = lower
         .          .    214:	i.upper = upper
         .          .    215:	i.bpfs = filterer
         .          .    216:	i.useFilter = useFilter
         .          .    217:	i.reader = r
         .          .    218:	i.cmp = r.Compare
         .          .    219:	i.stats = stats
         .          .    220:	i.transforms = transforms
         .          .    221:	i.bufferPool = bufferPool
         .          .    222:	err = i.index.initHandle(i.cmp, r.Split, indexH, transforms)
         .          .    223:	if err != nil {
         .          .    224:		// blockIter.Close releases indexH and always returns a nil error
         .          .    225:		_ = i.index.Close()
         .          .    226:		return err
         .          .    227:	}
         .          .    228:	i.dataRH = objstorageprovider.UsePreallocatedReadHandle(ctx, r.readable, &i.dataRHPrealloc)
         .          .    229:	if r.tableFormat >= TableFormatPebblev3 {
         .          .    230:		if r.Properties.NumValueBlocks > 0 {
         .          .    231:			// NB: we cannot avoid this ~248 byte allocation, since valueBlockReader
         .          .    232:			// can outlive the singleLevelIterator due to be being embedded in a
         .          .    233:			// LazyValue. This consumes ~2% in microbenchmark CPU profiles, but we
         .          .    234:			// should only optimize this if it shows up as significant in end-to-end
         .          .    235:			// CockroachDB benchmarks, since it is tricky to do so. One possibility
         .          .    236:			// is that if many sstable iterators only get positioned at latest
         .          .    237:			// versions of keys, and therefore never expose a LazyValue that is
         .          .    238:			// separated to their callers, they can put this valueBlockReader into a
         .          .    239:			// sync.Pool.
    114713     114713    240:			i.vbReader = &valueBlockReader{
         .          .    241:				bpOpen: i,
         .          .    242:				rp:     rp,
         .          .    243:				vbih:   r.valueBIH,
         .          .    244:				stats:  stats,
         .          .    245:			}

@jbowens
Copy link
Collaborator Author

jbowens commented Apr 30, 2024

         .          .    619:	var rp sstable.ReaderProvider
         .          .    620:	if tableFormat >= sstable.TableFormatPebblev3 && v.reader.Properties.NumValueBlocks > 0 {
    163844     163844    621:		rp = &tableCacheShardReaderProvider{c: c, file: file, dbOpts: dbOpts}
         .          .    622:	}

@sumeerbhola Is it true that we could refactor this so that either multiple iterators use the same tableCacheShardReaderProvider (with their own refs), or by embedding the provider implementation directly onto the tableCacheValue (again, with 1 ref per iterator using it as a reader provider)? It seems like we just need:

  1. a reference on the table cache value's ref count as long as we might need to access the table's reader
  2. the ability to get the table cache value's reader

@jbowens
Copy link
Collaborator Author

jbowens commented Apr 30, 2024

The EncodeMVCCValue regression is from e817d29 (cc @stevendanna):

goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU @ 2.80GHz
                                                │ abee742f88c.txt │           e817d2905ff.txt           │           546218e68c1.txt           │ 3befd712cdf706afb53551c4f349d6042a1cda88.txt │
                                                │     sec/op      │   sec/op     vs base                │   sec/op     vs base                │        sec/op         vs base                │
EncodeMVCCValue/header=empty/value=tombstone-24       4.989n ± 0%   7.684n ± 0%  +54.00% (p=0.000 n=20)   7.686n ± 0%  +54.04% (p=0.000 n=20)            7.848n ± 0%  +57.29% (p=0.000 n=20)
EncodeMVCCValue/header=empty/value=short-24           4.986n ± 0%   7.679n ± 0%  +54.02% (p=0.000 n=20)   7.684n ± 0%  +54.11% (p=0.000 n=20)            7.698n ± 0%  +54.39% (p=0.000 n=20)
EncodeMVCCValue/header=empty/value=long-24            4.987n ± 0%   7.681n ± 0%  +54.01% (p=0.000 n=20)   7.823n ± 0%  +56.87% (p=0.000 n=20)            7.694n ± 0%  +54.29% (p=0.000 n=20)
geomean                                               4.987n        7.681n       +54.01%                  7.731n       +55.00%                           7.747n       +55.32%

I'm not sure we care. I don't think EncodeMVCCValue is ever prominent in CPU profiles.

@sumeerbhola
Copy link
Collaborator

Is this because this benchmark creates a lot of obsolete versions and now began to default to using value blocks, where previously it did not?

Why would this change for v24.1?

@sumeerbhola Is it true that we could refactor this so that either multiple iterators use the same tableCacheShardReaderProvider (with their own refs)

yes, but then we need a place to cache these tableCacheShardReaderProviders (I think which you anticipated below)

or by embedding the provider implementation directly onto the tableCacheValue (again, with 1 ref per iterator using it as a reader provider)?

Currently the tableCacheShardReaderProvider has the lifetime of the valueBlockReader and the *tableCacheValue has the shorter lifetime sandwiched in the GetReader/Close pair, which is invoked for each valueBlockReader.Fetch after the valueBlockReader has been closed (which happens when the corresponding sstable iterator is closed). I think this short lifetime choice was potentially too aggressive since the lifetime of a LazyValue pointing to the valueBlockReader is slightly longer, but also limited. Hmm, now I remember -- I think this was because there is nothing that will notify us when the LazyValue is no more (golang gc will simply collect it), so we won't know when to call ReaderProvider.Close.

Coming back to your idea, I think a similar problem occurs there. We can give valueBlockReader a *tableCacheValue but that means it is holding a ref, and there is nothing that will trigger the unref once there is no LazyValue pointing to that valueBlockReader.

@jbowens
Copy link
Collaborator Author

jbowens commented May 1, 2024

Why would this change for v24.1?

I think it was previously gated behind the value blocks cluster setting for older cluster versions, and required a current cluster version for the on-by-default behavior. I suspect the benchmark wasn't defaulting to the newest possible cluster version.

@jbowens jbowens closed this as completed May 2, 2024
Storage automation moved this from In Progress (this milestone) to Done May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-storage Storage Team
Projects
Storage
  
Done
Development

No branches or pull requests

2 participants