Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] feat: more TPS metrics #3147

Merged
merged 2 commits into from Mar 5, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
17 changes: 16 additions & 1 deletion node/consensus/src/lib.rs
Expand Up @@ -212,6 +212,11 @@ impl<N: Network> Consensus<N> {
impl<N: Network> Consensus<N> {
/// Adds the given unconfirmed solution to the memory pool.
pub async fn add_unconfirmed_solution(&self, solution: ProverSolution<N>) -> Result<()> {
#[cfg(feature = "metrics")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably add this to after the unconfirmed solutions and transmissions are added, since there are a few points where we fail or return early

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I did that, but @vicsn requested to move to top.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I saw that. I think if we want it at the top we'd have to add metrics flagged code to decrement the gauge on failure or early return, which probably isn't optimal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point is to see the difference with how many come in and how many actually make it into a block.

Copy link
Contributor

@miazn miazn Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, although in that case it would probably be better to track it from the delivering side, i.e. whatever we are sending. I think maybe that in this case, we are omitting metrics maybe one layer too deep in the code since what we really want is to omit metrics whenever these post endpoints are called. If we do want to just track how many times those endpoints are hit it might be cleaner to move it there- @vicsn what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rationale for putting it in add_unconfirmed_{transaction, solution} is so we also count transactions received via the router.

I think measuring it on the delivering side (i.e. tx-cannon) would be even better, but that requires more coordination and time to figure out.

{
metrics::increment_gauge(metrics::consensus::SOLUTIONS, 1f64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we had this conversation on the prior tps pr, but I'm wondering now if we are adding so many tps metrics, whether we should change the snarkVM metrics crate increment_counter function to take a param to increase the number by rather than just a constant, since this should technically be a counter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also possible, but then we have 2 linked PRs (snarkVM + snarkOS) which makes it more annoying to merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's why I avoided it on the first one-- however if we do end up adding more and more metrics it makes sense to eventually change it-- could be an issue for a later date in snarkvm? @vicsn

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I personally don't see the need but make an issue for the future if you think the added function will make us more robust!

metrics::increment_gauge(metrics::consensus::TRANSMISSIONS, 1f64);
}
// Process the unconfirmed solution.
{
let solution_id = solution.commitment();
Expand Down Expand Up @@ -265,6 +270,11 @@ impl<N: Network> Consensus<N> {

/// Adds the given unconfirmed transaction to the memory pool.
pub async fn add_unconfirmed_transaction(&self, transaction: Transaction<N>) -> Result<()> {
#[cfg(feature = "metrics")]
{
metrics::increment_gauge(metrics::consensus::TRANSACTIONS, 1f64);
metrics::increment_gauge(metrics::consensus::TRANSMISSIONS, 1f64);
}
// Process the unconfirmed transaction.
{
let transaction_id = transaction.id();
Expand Down Expand Up @@ -405,9 +415,14 @@ impl<N: Network> Consensus<N> {
let elapsed = std::time::Duration::from_secs((snarkos_node_bft::helpers::now() - start) as u64);
let next_block_timestamp = next_block.header().metadata().timestamp();
let block_latency = next_block_timestamp - current_block_timestamp;
let num_sol = next_block.solutions().len();
let num_tx = next_block.transactions().len();
let num_transmissions = num_tx + num_sol;

metrics::gauge(metrics::blocks::HEIGHT, next_block.height() as f64);
metrics::increment_gauge(metrics::blocks::TRANSACTIONS, next_block.transactions().len() as f64);
metrics::increment_gauge(metrics::blocks::SOLUTIONS, num_sol as f64);
metrics::increment_gauge(metrics::blocks::TRANSACTIONS, num_tx as f64);
metrics::increment_gauge(metrics::blocks::TRANSMISSIONS, num_transmissions as f64);
metrics::gauge(metrics::consensus::LAST_COMMITTED_ROUND, next_block.round() as f64);
metrics::gauge(metrics::consensus::COMMITTED_CERTIFICATES, num_committed_certificates as f64);
metrics::histogram(metrics::consensus::CERTIFICATE_COMMIT_LATENCY, elapsed.as_secs_f64());
Expand Down