Skip to content

Commit

Permalink
ui: fix replication lag metric for multinode clusters and cutover
Browse files Browse the repository at this point in the history
Replication lag metric would report absurdly high lag for multinode
clusters as it would take the average of the reported timestamps, and as
some nodes may report 0, this would cause extremely low replicated
times. Patched by taking the highest replicated time of all the nodes.
Also stop reporting replication lag when ingesting has stopped (e.g.
cutover or job cancel/fail).

Informs cockroachdb#120652

Release note (ui change): fix replication lag metric reporting for multinode
clusters and cutover
  • Loading branch information
kev-cao committed May 2, 2024
1 parent e88bd17 commit e5122e0
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
3 changes: 2 additions & 1 deletion pkg/ccl/streamingccl/streamingest/stream_ingestion_job.go
Original file line number Diff line number Diff line change
Expand Up @@ -510,12 +510,13 @@ func maybeRevertToCutoverTimestamp(
if p.ExecCfg().StreamingTestingKnobs != nil && p.ExecCfg().StreamingTestingKnobs.OverrideRevertRangeBatchSize != 0 {
batchSize = p.ExecCfg().StreamingTestingKnobs.OverrideRevertRangeBatchSize
}
p.ExecCfg().JobRegistry.MetricsStruct().StreamIngest.(*Metrics).ReplicatedTimeSeconds.Update(0)
if err := revertccl.RevertSpansFanout(ctx,
p.ExecCfg().DB,
p,
remainingSpansToRevert,
cutoverTimestamp,
// TODO(ssd): It should be safe for us to ingore the
// TODO(ssd): It should be safe for us to ignore the
// GC threshold. Why aren't we?
false, /* ignoreGCThreshold */
batchSize,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ export default function (props: GraphDashboardProps) {
<Axis units={AxisUnits.Duration} label="duration">
<Metric
downsampler={TimeSeriesQueryAggregator.MIN}
aggregator={TimeSeriesQueryAggregator.AVG}
aggregator={TimeSeriesQueryAggregator.MAX}
name="cr.node.physical_replication.replicated_time_seconds"
title="Replication Lag"
transform={datapoints =>
Expand Down

0 comments on commit e5122e0

Please sign in to comment.