Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new consensus summary metric to track missed blocks, deprecate missed blocks gauge. #1310

Open
jevonearth opened this issue Apr 16, 2024 · 0 comments
Labels
ice-box issues are automatically assigned this label until they are planned. introspection T:enhancement Type: Enhancement

Comments

@jevonearth
Copy link
Contributor

jevonearth commented Apr 16, 2024

Feature Request

Summary

Introduce a new Prometheus counter metric named celestia_consensus_validator_missed_blocks_total and deprecate the existing gauge metric celestia_consensus_validator_missed_blocks to more accurately track changes over time.

Problem Definition

The current implementation uses a gauge type for the celestia_consensus_validator_missed_blocks metric. While gauges are useful for values that can increase and decrease, such as temperatures or amounts of free memory, they are not ideal for counting occurrences of events that only increase, such as missed blocks. Gauges do not inherently support tracking rates of increase or decrease without additional computation, which can lead to less efficient monitoring and potential inaccuracies in alerting or historical data analysis.

Including a counter metric for missed blocks would allow Prometheus to automatically handle rate calculations and more accurately reflect the operational health and performance trends of the validator. This change would align with Prometheus best practices.

Proposal

  1. Introduce a New Counter Metric: Implement celestia_consensus_validator_missed_blocks_total as a counter metric that increments each time a validator misses a block.

  2. Deprecate the Existing Gauge Metric: Mark celestia_consensus_validator_missed_blocks as deprecated in the codebase and documentation, encouraging users to transition to the new counter metric.

  3. Update metric descriptions: Revise the metric descriptions to explain the use of the new counter and the deprecation path for the existing gauge.

# HELP celestia_consensus_validator_missed_blocks (Deprecated) Total missed blocks for a validator. This metric is deprecated and will be removed in future versions. Please use celestia_consensus_validator_missed_blocks_total instead.
# TYPE celestia_consensus_validator_missed_blocks gauge
# HELP celestia_consensus_validator_missed_blocks_total Total number of blocks missed by the validator
# TYPE celestia_consensus_validator_missed_blocks_total counter
  1. Implementation Timeline: We (ECAD Labs) will submit a Pull Request with an implementation soon.
@evan-forbes evan-forbes added T:enhancement Type: Enhancement introspection ice-box issues are automatically assigned this label until they are planned. and removed needs:triage labels May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ice-box issues are automatically assigned this label until they are planned. introspection T:enhancement Type: Enhancement
Projects
None yet
Development

No branches or pull requests

2 participants