walredo metrics: some are just for apply_batch_postgres, others include apply_batch_neon #7595
Labels
c/storage/pageserver
Component: storage: pageserver
t/bug
Issue Type: Bug
triaged
bugs that were already triaged
Problem
apply_batch_postgres does
.observe()
on these metricsneon/pageserver/src/walredo.rs
Lines 258 to 260 in 2d5a846
whereas apply_batch_neon does
.observe()
only on this metricneon/pageserver/src/walredo.rs
Lines 350 to 352 in 2d5a846
This makes it error-prone to reason about walredo metrics, as, e.g.,
WAL_REDO_TIME
has a different_count
than theWAL_REDO_RECORDS_HISTOGRAM
andWAL_REDO_BYTES_HISTOGRAM
I stumbled across this and wasted multiple hours when trying to qualify async walredo.
(Actually, Joonas added that FIXME in the code quoted above. This issue addresses the FIXME)
Solution
WAL_REDO_TIME
,WAL_REDO_RECORDS_HISTOGRAM
)metrics::WalRedoManagerMetrics
replayer=walredoproc|inprocess
to distinguish them.WAL_REDO_BYTES_HISTOGRAM
is walredo process specificmetrics::WalRedoProcessMetrics
(maybe we already have something like that)pageserver_wal_redo_bytes_histogram
topageserver_wal_redo_proc_bytes_histogram
The text was updated successfully, but these errors were encountered: