Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enables gRPC Health Checking Protocol on Open Telemetry Collector #8955

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

pantuza
Copy link

@pantuza pantuza commented Nov 18, 2023

Description: Adding a feature - Enabling gRPC Health Checking Protocol on Open Telemetry Collector.

Link to tracking Issue: Not Applicable

Testing:
Before this PR, if you use any gRPC tool [1,2,3] to probe otelcol gRPC port (4317) you would get as response:

$> grpc-health-probe -addr=localhost:4317
error: this server does not implement the grpc health protocol (grpc.health.v1.Health): unknown service grpc.health.v1.Health

Now, the gRPC server responds properly to gRPC Health Protocol:

$> grpc-health-probe -addr=localhost:4317
status: SERVING

Documentation:
I will be glad to add documentation somewhere in the project.
By now, I am considering these changes simply complies with the gRPC industry standard gRPC Health Checking Protocol.
Thus, I think it worths an entry on the Changelog but not necessarily a documentation entry.

Why these changes matter?
It is quite common for gRPC users to probe their gRPC endpoints using the tools described above.
It helps checking if your remote server is up and running before even knowing details of its remote routes/methods.

Therefore, this PR improves developer experience while using or getting started with Open Telemetry Collector.
Here follow some of arguments for this PR:

  1. It makes Collector more observable which complies with this section of the long-term vision
  2. It makes Collector more usable out of the box as described in this section of the long-term vision
  3. It adds a quite simple feature which complies with the contributing guideline
  4. It does not add ANY new dependency. Basically we simply enabled a feature that was sitting in the project but not getting used. The code uses google.golang.org/grpc library which was already imported in the Collector code.

Please let me know if there is anything missing here or anything I can improve in this contribution.

@pantuza pantuza requested a review from a team as a code owner November 18, 2023 03:04
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

github-actions bot commented Dec 3, 2023

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Dec 3, 2023
@pantuza
Copy link
Author

pantuza commented Dec 6, 2023

Any feedback here, folks? 
Is there anything I can do to trigger the workflows so I can see any pending contributions I need to do?

@atoulme
Copy link
Contributor

atoulme commented Dec 6, 2023

I triggered them for you.

Copy link

codecov bot commented Dec 6, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.56%. Comparing base (1d52fb9) to head (ab3cc9e).
Report is 45 commits behind head on main.

Current head ab3cc9e differs from pull request most recent head ac73929

Please upload reports for the commit ac73929 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8955      +/-   ##
==========================================
- Coverage   91.63%   91.56%   -0.08%     
==========================================
  Files         356      360       +4     
  Lines       16849    16695     -154     
==========================================
- Hits        15439    15286     -153     
- Misses       1068     1073       +5     
+ Partials      342      336       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@TylerHelmuth TylerHelmuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea to me. I think a changelog is appropriate.

@github-actions github-actions bot removed the Stale label Dec 7, 2023
@pantuza
Copy link
Author

pantuza commented Dec 8, 2023

Hi folks, I have added the changelog entry (file).
Can we trigger CI once again to check if everything is ok?

.chloggen/grpc-health-checking-protocol.yaml Outdated Show resolved Hide resolved
.chloggen/grpc-health-checking-protocol.yaml Outdated Show resolved Hide resolved
pantuza and others added 4 commits December 11, 2023 15:37
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
@pantuza
Copy link
Author

pantuza commented Dec 12, 2023

Thank you for the suggestions, @TylerHelmuth.
I have also updated this PR with latest changes that was on main branch.
Do you mind triggering the CI so we can check if there is anything missing?
Please, let me know if you see any other improvements.

@TylerHelmuth
Copy link
Member

@pantuza I don't have the right permissions in this repo. @open-telemetry/collector-approvers please take a look.

@github-actions github-actions bot added Stale and removed Stale labels Jan 3, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added Stale and removed Stale labels Jan 18, 2024
Copy link
Contributor

github-actions bot commented Feb 3, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Feb 3, 2024
@github-actions github-actions bot removed the Stale label Feb 7, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Feb 22, 2024
@github-actions github-actions bot removed the Stale label Feb 29, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added Stale and removed Stale labels Mar 21, 2024
@ksandrmatveyev
Copy link

Hello @pantuza,
Is there anything that might be changed from your side to pass checks?

FYI, @open-telemetry/collector-approvers, @dmitryax

@pantuza
Copy link
Author

pantuza commented Apr 4, 2024

Hello @pantuza, Is there anything that might be changed from your side to pass checks?

FYI, @open-telemetry/collector-approvers, @dmitryax

Hi, @ksandrmatveyev. Thank you for the reply.
Not actually. I don't have permissions to trigger the CI execution. Last time it was triggered, everything passed.

I see that by now this PR is outdated from Master/Main branch.
I will make sure to update it.

I would be really glad If someone can trigger the CI so we can check if everything is ok.

@TylerHelmuth
Copy link
Member

@pantuza I'm adding this PR to the collector's SIG agenda on Monday.

Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense.

One question: is there any reason a user wouldn't want to have this enabled? It doesn't look like it is configurable.

@jpkrohling
Copy link
Member

is there any reason a user wouldn't want to have this enabled?

In general, it's good practice to have admin-related endpoints at a different port than the workload, but for this specific one, I think it's ok and desirable to have it the way it's implemented in this PR.

@ksandrmatveyev
Copy link

@pantuza
Copy link
Author

pantuza commented May 10, 2024

Seems it fails on test phase https://github.com/open-telemetry/opentelemetry-collector/actions/runs/8838230921/job/24299483330?pr=8955 FYI @pantuza

Hi, @ksandrmatveyev . Thank you for the reply.

Looking at the error, it is related to contrib project. That jaegerremotesampling extension is registering the Health server. Thus, gRPC fails saying there are two codes trying to register the same service (reference of the error).

Since we are enabling it directly on the Collector, no extension needs to enable it. It will become a builtin feature of the
Collector. What do you think?

As a follow up for this PR, I can send a Pull Request to contrib project disabling this on jaegerremotesampling code.
Any thoughts on my suggestions?

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants