Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ADR template and metrics in testing ADR. #152

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

linabutler
Copy link
Member

@linabutler linabutler commented Dec 18, 2022

Rendered view

This PR:

  1. Adds a template for our Architecture Decision Records, using the same format as Firefox Accounts. It's more structured than Nygard's status-context-decision-consequences template, but explicitly enumerates the alternatives considered and the pros and cons of each, which has been helpful in other projects (FxA, Application Services, UniFFI) that use this format. How do we feel about trying out this format?
  2. Adds our first ADR for DISCO-2089 and DISCO-2090!

This commit adds a template for our Architecture Decision Records,
using the same format as Firefox Accounts.

It's more structured than Nygard's status-context-decision-consequences
template, but explicitly enumerates the alternatives considered and the
pros and cons of each, which has been helpful in other projects (FxA,
Application Services, UniFFI) that use this format.

### D. Subclass `aiodogstatsd.Client` to suppress sending metrics to StatsD in tests.

This option would change `metrics.Client` to be a subclass of `aiodogstatsd.Client`, instead of proxying calls to it. The subclass would override the `{gauge, increment, decrement, histogram, distribution, timing}()` methods to log metrics if the f
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, looks like the paragraph was truncated or unfinished.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😳 Oops, thanks for catching that!


## Decision Outcome

TODO: Fill me in once we decide!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we identified any winner(s) or favorite(s) yet? Or we will fill that in at a later time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another related question: are we using the PR as a discussion where we come to a decision? or is the intent that we have a discussion and decision made elsewhere, and this PR is purely for recording the already made decision? The reason I ask this is because the way I might go about reviewing such a PR will be different based on whether I am meant to discuss and sway a decision, vs a review on its quality as technical documentation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second this. It's a little disorienting between the RFC list and an ADR PR. I'm fine either way, but ADR in my mind is a decision to be recorded whereas an RFC is an idea to be discussed. We should make whatever decision consistent across App Svcs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great question—I'm curious what y'all think! 😄 Reading our RFC process guidelines, I got the impression that it's geared toward significant changes (new features, cross-team projects, design reviews, substantial refactors) seeking feedback from multiple stakeholders, and I wasn't sure that this counted as either.

The line between the RFC list and ADR PRs feels a little fuzzy to me, too—for example, @mhammond (hi Mark! 👋🏼) just put up mozilla/application-services#5302, which is a draft ADR with a chosen solution sent to the RFC list for feedback.

For this one specifically, @hackebrot and I started a discussion on the Jira ticket to talk through some of these options, and we're meeting this week to chat more about them and make a decision. After that, we can update the PR with the outcome. Do we feel like that's a good approach for these kinds of smaller discussions, or would going through the RFC process be valuable—for example, if other folks want to chime in?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a bit messy - it's a classic "if the only tool you have is a hammer, everything looks like a nail". I did an ADR because app-services already has ADRs, but doesn't really have RFCs, and I didn't really want to invent a separate RFC process just for this. The ADR template is also a little too prescriptive IMO and doesn't really suit all such discussions, so my ADR kinda ignored the template a little. It was sent to the RFC list because that was suggested by karloff as the appropriate place to reach the other stakeholders for remote-settings in particular.

So yeah, my ADR was probably closer to an RFC :) I don't really think the distinction is particularly useful - in practice, someone writing an RFC intends making a decision, just like an ADR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the feedback! What I'm understanding from what has been mentioned is that we have a way to solicit feedback on an idea (the RFC process), and that we can link the ADR PR for discussion or some other document. For this particular PR, discussion on the decision is being made in JIRA so this ADR is not the place to discuss the decision. I think this process is fine, and I don't see a strong reason to change it. I think that ADR reviews tend to solicit another round of evaluating the decision, so knowing the process for decision making is important for not delaying actually having a decision made!

Comment on lines 13 to 17
## Decision Drivers <!-- optional -->

* [driver 1, e.g., a force, facing concern, …]
* [driver 2, e.g., a force, facing concern, …]
* … <!-- numbers of drivers can vary -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decision drivers can sometimes be ranked, as they aren't all equally as important. I'm wondering if there's a way we can capture it here. 🤔 (or maybe we can have it justified in Decision Outcome?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, yes—maybe we can represent this as a numbered list, and expand the blurb in the outcome section about the relative importance of the decision drivers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah a simple numbered list sound like a good start! I was originally thinking of maybe adding some sort of ranking system where you add a number (from 1-5, for instance) next to each point to indicate importance, but that might be more complicated than it's worth. We can change this later if we find that a numbered list doesn't exactly gives us the nuance that we need for the ranking.

Comment on lines 46 to 49
* Good, because [argument a]
* Good, because [argument b]
* Bad, because [argument c]
* … <!-- numbers of pros and cons can vary -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal preference nit: I would prefer that we format this with separate Pro/Cons list.

#### Pros:

* [argument a]
* [argument b]

#### Cons:

* [argument c]

I think the titles and visual separation helps me with scanning the document (quicker to find what I'm looking for). But if there's a standard that we're trying to adhere to, then that can override my personal preference!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it! I copy-pasted this template from FxA/UniFFI, but think it's more important to have a structure that works for us, and I totally agree that it makes scanning a lot easier—and we can inspire changes the other way, too! 🚀


## Context and Problem Statement

merino-py emits counts, histograms, and timing metrics as it runs. In production, these metrics are sent to Datadog, using an extension of the [StatsD](https://github.com/statsd/statsd) protocol called [DogStatsD](https://docs.datadoghq.com/developers/dogstatsd). For local development, merino-py provides the `metrics.dev_logger` config option, which logs outgoing metrics to the console instead of sending them to Datadog. merino-py also has integration tests that cover recording and emitting metrics. How can we avoid having the `metrics.dev_logger` setting and integration tests rely on implementation details of the `aiodogstatsd` client library to suppress reporting to Datadog?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the documentation more maintainable, I would suggest breaking down paragraphs with semantic linefeeds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, I like that a lot—thank you!

@mtbottle
Copy link
Contributor

Thanks for starting this! 🌟 The first ADR is definitely an interesting topic!

* Rank the decision drivers in order of importance.
* Separate sections for the pros and cons of each option.
* Conform to ADR template changes.
* Use semantic linefeeds for maintainability.
* Fix truncated description for option D.
@hackebrot hackebrot self-requested a review January 4, 2023 14:57
Copy link
Member

@hackebrot hackebrot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @linabutler! 👋🏻

Thank you for creating this ADR! 🚀

I've added a comment about the data flow of metrics in merino-py.

## Context and Problem Statement

merino-py emits counts, histograms, and timing metrics as it runs.
In production, these metrics are sent to Datadog, using an extension of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current understanding is that we do not send merino-py metrics to Datadog.

The data flow is as follows:

  1. merino-py uses aiodogstatsd to record and submit StatsD metrics
  2. the aiodogstatsd client is configured to send metrics to a telegraf agent
  3. the telegraf agent uses a StatsD Input and InfluxDB Output
  4. the telegraf agent sends metrics to InfluxDB
  5. Grafana queries InfluxDB and visualizes metrics

Does that sound accurate to you, @dlactin?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, we use the aiodogstatd package because it supports the Datadog extensions to the StatsD protocol which are enabled on telegraf.

In production, these metrics are sent to Datadog, using an extension of the
[StatsD](https://github.com/statsd/statsd) protocol called [DogStatsD](https://docs.datadoghq.com/developers/dogstatsd).
For local development, merino-py provides the `metrics.dev_logger` config option,
which logs outgoing metrics to the console instead of sending them to Datadog.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

which logs outgoing metrics to the console instead of sending them to Datadog.
merino-py also has integration tests that cover recording and emitting metrics.
How can we avoid relying on implementation details of the `aiodogstatsd` client library
to suppress reporting to Datadog?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to suppress reporting to Datadog?
to suppress reporting to InfluxDB?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants