Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional endpoints for beta deployments #401

Open
saswatamcode opened this issue Oct 26, 2022 · 3 comments
Open

Optional endpoints for beta deployments #401

saswatamcode opened this issue Oct 26, 2022 · 3 comments

Comments

@saswatamcode
Copy link
Member

Currently, to deploy beta instances of various Observatorium components like Thanos Querier, and make them available to end-user tenants (for dogfooding purposes), we would need to deploy new instances of the lightweight Observatorium API as well.

This poses problems as we would need to provision new resources (eg DNS routes) for the duplicated API deployment and have mixed beta/non-beta components configured on the beta API itself (Querier is beta, but maybe not Receive).

A potential idea could be allowing new beta endpoints within the same Observatorium instance, by passing optional flags. For example, specifying --metrics.read.beta.endpoint would enable api/metrics/beta/v1/<tenant>/api/v1/query endpoint on the API and forward requests to the Querier specified by this flag. This flag can even be repeatable.

Thus, one would not need to create a duplicated deployment of Observatorium API and provision resources just for dogfooding one particular component, as it can now be configured on the existing API and made available to tenants at a sub-path.

WDYT? Would love opinions on this, maybe there is something easier than this. 🙂

@periklis
Copy link
Contributor

AFAIU the problem statement here is that we want to test beta versions of Observatorium components under live traffic scenarios to gain enough confidence about their performance/reliability/etc. aspects without endangering the actual setup. This kind of testing is by itself very important for the entire reliability of our service offering and besides leaves room for experimentation and exploration.

Although I would like to see this kind of testing beta components showing up in mid-term, imho the proposed solution to expose extra paths has many unwanted traits:

  • Making something like beta in the API makes it part of the API without controlling per se who has access to it. What if a tenant like the beta features and builds its stuff around it with the typical expectation "Hey it's beta, it will soon out".
  • Applying the API version schema for the beta paths gives another multiplication factor for maintining paths over paths over paths in the same code base. This will get soon very confusing as we maintain many authn/authz/rate-limiting/label-enforcing middlewares that work based on attributes in the request context and/or request/response payload. What if a beta API changes/removes a request parameter or payload field? This will lead to branching in middlewares based on API maturity level (beta vs. non-beta) and versions.
  • Another hidden pitfall is that the proposed approach silently partitions the reliability of the API even in the event filtering the beta paths from SLIs. As long as traffic passes through the same vains aka HTTP listener it shares effectively the same memory. Means that if a beta component request/response handling the API overamplifies a bug in the API because of new behavior or payload changes, we have no simple means to say stop serving the beta API.

In general I am pretty happy that someones suggests that we need something to test beta components. However, I am not a fan mixing this into the same API deployment. Production deployments should stay segregated from anything else. Nevertheless this problem statement screams for a validation solution not only for beta deployments but also for any new deployments. From experience I can report that we should be looking on more holistic techniques like traffic mirroring and/or canary deployments.

@matej-g
Copy link
Contributor

matej-g commented Oct 26, 2022

I'd agree with @periklis and as we also talked F2F I would have similar concerns as Peri mentions above. What I had in mind was to allow to specify different environments (so we would not call it 'beta', but users would be able to optionally specify and define multiple writers / readers, this could be used also on case e.g. you have two different regions and you'd want to allow users to chose which region to write to / read from).

Ultimately at this time it sounds to me like we're trying to work around our own operational issue when it's not easy to setup new routes etc., but it should not be influencing the API design. I still think that the API is rather lightweight so even having an extra beta / canary instance does not seem like a big deal to me overall.

@squat
Copy link
Member

squat commented Oct 26, 2022

I think that canary deployments of the Observatorium API with different configuration or blue/green deployments on the same cluster are the right solution for the stated purpose. This prevents any need for provisioning additional infrastructure resources and also avoids the issue of building the guarantees of the beta routes into the API contract of Observatorium.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants