Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: add metric for sstable compression types #3623

Merged
merged 2 commits into from
May 17, 2024

Conversation

itsbilal
Copy link
Member

Previously, we didn't have a metric tracking the compression type(s) of sstables. This change adds that, as a table stat that's collected through the LevelMetadata as an annotator field. This unfortunately had to be a table stat as the compression type isn't known until we open a reader and look at the properties block.

Pebble side of cockroachdb/cockroach#123952.

@itsbilal itsbilal requested a review from nicktrav May 15, 2024 18:57
@itsbilal itsbilal requested a review from a team as a code owner May 15, 2024 18:57
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this to the FileMetadata and the manifest? The default can be "unknown" but we can start populating it for new tables.

Reviewable status: 0 of 10 files reviewed, all discussions resolved (waiting on @nicktrav)

@itsbilal
Copy link
Member Author

@RaduBerinde We could, but then we'd also have to shop it around with external/shared files. I felt that it's an emergent property of the blocks in the same way as a lot of other properties etc., are, so it's best to leave it there for semantic separation from the stuff that's usually in FileMetadata. I'm open to doing that if we think that's the preferred approach, though

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, just a thought.

:lgtm:

Reviewable status: 0 of 10 files reviewed, 3 unresolved discussions (waiting on @itsbilal and @nicktrav)


metrics.go line 609 at r1 (raw file):

		humanize.Bytes.Uint64(m.VirtualSize()))
	w.Printf("Local tables size: %s\n", humanize.Bytes.Uint64(m.Table.Local.LiveSize))
	w.Printf("Compression types:  unknown: %d  snappy: %d  zstd: %d\n",

I think we should hide the 0 counts to make it more readable. I'd also format it as 1 snappy, 2 zstd, 3 unknown.


table_stats.go line 1164 at r1 (raw file):

		vptr.zstd++
	default:
		vptr.unknown++

Are we conflating NoCompression into unknown? I think we should either count it separately, or don't count it at all.


table_stats.go line 1185 at r1 (raw file):

		return
	}
	compression := v.Levels[level].Annotation(compressionTypeAnnotator{}).(*compressionTypes)

This annotator functionality is pretty cool!

@RaduBerinde
Copy link
Member

table_stats.go line 1164 at r1 (raw file):

Previously, RaduBerinde wrote…

Are we conflating NoCompression into unknown? I think we should either count it separately, or don't count it at all.

By the way, I think we should explicitly put unknown under case DefaultCompression: and explain that this is the zero value which means it wasn't filled in.

Previously, we didn't have a metric tracking the compression type(s)
of sstables. This change adds that, as a table stat that's collected
through the LevelMetadata as an annotator field. This unfortunately
had to be a table stat as the compression type isn't known until
we open a reader and look at the properties block.

Pebble side of cockroachdb/cockroach#123952.
@nicktrav
Copy link
Contributor

@RaduBerinde - re-pushed to address the formatting and the "no compression" case.

One thing I'm noticing is that if we restart a DB, all the files will have compression type unknown, even after we do the async stats loading. That is unfortunate, as we'll only update the compression types when a table is rewritten via a compaction. I'll look at fixing that.

@itsbilal
Copy link
Member Author

@nicktrav are we seeing that in practice? The cacheable return value from the annotator should be false until the stats are marked as valid, so we shouldn’t be caching the unknowns in the tree.

Sorry for not wrapping this up Wednesday!

@nicktrav
Copy link
Contributor

nicktrav commented May 17, 2024

are we seeing that in practice?

I think we are. For example, in this patch, look at testdata/metrics. On L918 we have snappy: 2 unknown: 1. We then re-open the DB on L931. And then re-compute the metrics. But the metrics show unknown: 3. Computing the metrics blocks on the table stats collection, but only for new files.

I think the issue is that in (*DB).loadTableStats, we use withCommonReader to read from the table. We are then using CommonProperties to populate the table stats. But this struct doesn't have the compression type (it's in the larger Properties struct).

A simple solution is to move the compression type up into that struct, but I'm having some issues getting the randomized tests to pass.

@nicktrav
Copy link
Contributor

Pushed a fix for getting the compression stats populated in an existing DB that is opened.

I think we need this, else our plans to monitor the transition from one compression type to another would only take into account new sstables (flushed, compacted, ingested, etc.).

Using CommonProperties is a new thing since I was last in here, but it seems reasonable to use it, as I would assume that a vSST would have the same compression type as it's physical table?

@RaduBerinde - mind taking another pass?

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 15 files reviewed, 4 unresolved discussions (waiting on @itsbilal and @nicktrav)


sstable/properties.go line 133 at r3 (raw file):

	ComparerName string `prop:"rocksdb.comparator"`
	// The compression options used to compress blocks.
	CompressionOptions string `prop:"rocksdb.compression_options"`

CompressionName and CompressionOptions belong together. I think it's ok if both live in CommonProperties

Currently, the compression type for a table is known when the SST is new
\- i.e. flushed, compacted, ingested. However, when opening an existing
DB, the compression types will show as "unknown".

Allow the async, background stats loading job that runs on DB open to
infer the compression type of existing tables by lifting the compression
field from `sstable.Properties` up into `sstable.CommonProperties`. This
allows `loadTableStats` to use the field and populate the
`manifest.TableStats` struct.
Copy link
Contributor

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 15 files reviewed, 2 unresolved discussions (waiting on @itsbilal and @RaduBerinde)


metrics.go line 609 at r1 (raw file):

Previously, RaduBerinde wrote…

I think we should hide the 0 counts to make it more readable. I'd also format it as 1 snappy, 2 zstd, 3 unknown.

Done.


sstable/properties.go line 133 at r3 (raw file):

Previously, RaduBerinde wrote…

CompressionName and CompressionOptions belong together. I think it's ok if both live in CommonProperties

Done.

@nicktrav
Copy link
Contributor

TFTR!

@nicktrav nicktrav merged commit 6195a2c into cockroachdb:master May 17, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants