Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal - add message routing sink to metrics collector #6879

Open
vasdee opened this issue Feb 3, 2023 · 5 comments
Open

Proposal - add message routing sink to metrics collector #6879

vasdee opened this issue Feb 3, 2023 · 5 comments

Comments

@vasdee
Copy link

vasdee commented Feb 3, 2023

Apologies in advance, this issue doesn't fit neatly into any of the suggested areas for github tickets, so I'm listing it here.

We currently have metrics-collector deployed in a nested edge, corporate environment. It is generally working well, so kudos for the good work so far!

The issue we are facing is that we ideally want to use the AzureMonitor upload target from the parent device. We can't do this with a nested edge setup, because the child devices need to route via IoTMessage, so what happens is only the metrics from the parent metrics end up in log analytics, with the child device metrics being routed to iothub.

The info listed about metrics-collector state we could add an azure function to send messages from iothub, to log analytics, but it just feels like an extra step we don't need. On top of that, it means we are adding more messages to our iothub quota by going to the IotMessage route.

Looking at the metrics collector code base, it seems possible that we could add the ability to route metrics messages from child devices, into the parent level metrics collector. From there, the parent could bundle up all the messages and send to via the AzureMonitor upload target. I have a colleague who is looking to do the work, my question for the dev folks here are

  1. Do you see any issues with this approach? We've already confirmed child device message routing works on a parent, which was our big concern. But are there any others that we won't discover until we are deep into dev?
  2. Is this something that you would accept as a PR? Assuming it is developed according to contributing guidelines, would you in theory accept this as a feature?
@vadim-kovalyov
Copy link
Contributor

Hey @vasdee, thanks for the feedback! It is great to hear that nested edge works well for you. I'll add @veyalla and @varunpuranik to the discussion to answer your questions about improvements to the metrics collector.

@veyalla veyalla assigned micahl and unassigned veyalla Feb 3, 2023
@micahl
Copy link
Contributor

micahl commented Feb 3, 2023

Hi @vasdee glad to hear the metrics collector is working for you and thank you for explaining your scenario! The ask makes sense. From brief internal discussion we thought an alternative approach you might explore would be to tweak the API proxy module deployed on each of your nested nodes to route the API calls the MetricsCollector is doing to Log Analytics.

If that sounds viable then you could look at first creating a custom config to pass the API proxy. If that works and you're willing to contribute it back (e.g. by adding something like a LOG_ANALYTICS_ROUTE_ADDRESS option similar to the other config options on the API proxy) then we'd definitely appreciate a PR.

If a tweak to the API proxy isn't a viable option for some reason and it makes more sense to modify the metrics collector code base then we'd consider a PR for that as well.

Thoughts?

@vasdee
Copy link
Author

vasdee commented Feb 5, 2023

Thanks @micahl, that's actually quite a good solution and potentially a lot simpler. At the moment I'm already looking into extending the api-proxy to make use of the file serving ability of nginx, so this might slot in quite nicely.

@vasdee
Copy link
Author

vasdee commented Feb 14, 2023

@micahl I've managed to make some minimal changes to the metrics-collector and introduce a new UploadTarget called, "ApiProxyServer". When the UploadTarget environment variable is set to this, the module is assumed to be running in a nested edge environment. In this mode, the environment variable IOTEDGE_PARENTHOSTNAME is used as the base request, instead of the AzureMonitor defaults, <workspace id>.oms.opinsights.azure.com and <workspace id>.ods.opinsights.azure.com

I've set up a custom version of the api-proxy that includes a single location to mimic the initial "registration" request that metrics collector performs to the analytics endpoint, namely /AgentService.svc/AgentTopologyRequest . In my custom nginx config, it looks like this

location /AgentService.svc/AgentTopologyRequest {
           resolver 127.0.0.11;
           proxy_http_version 1.1;
           proxy_pass         https://<work space id>.oms.opinsights.azure.com;
           proxy_set_header   X-Forwarded-Proto $scheme;
           proxy_set_header   x-ms-version "August, 2014";
           proxy_pass_request_headers on;
           client_max_body_size 1000G;
        }

When I perform a curl from the child device, to the customised parent api-proxy module, I can receive some content from what appears to be the analytics endpoint, complaining about a missing header (intentional in this case)

curl -X POST https://<iotedge parent device fqdn>/AgentService.svc/AgentTopologyRequest
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<HTML><HEAD><TITLE>Length Required</TITLE>
<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
<BODY><h2>Length Required</h2>
<hr><p>HTTP Error 411. The request must be chunked or have a content length.</p>
</BODY></HTML>

This correlates with doing a similar request from the parent, directly to the analytics endpoint

curl -X POST --http1.1 https://<workspace id>oms.opinsights.azure.com/AgentService.svc/AgentTopologyRequest
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<HTML><HEAD><TITLE>Length Required</TITLE>
<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
<BODY><h2>Length Required</h2>
<hr><p>HTTP Error 411. The request must be chunked or have a content length.</p>
</BODY></HTML>

This tells me that the proxy is at least passing the request through to the endpoint successfully, however, if I address the issue of missing content length and add the -H 'Content-Length: 0' to the curl command, then all that is returned is a 403 from both commands.

This is where I'm a bit stumped, the same 403 is returned from my modified child metrics collector, but without any kind of body content I can't figure out what the next steps are. On top of that, the metrics collector looks to use an undocumented API that isn't intended for direct customer interaction, so there is very little information about what the API requires in the request for this proxied scenario.

I'm hoping someone can shed some light on what I might need here so I can proceed with this PoC, then move onto a proper solution. At the moment, it's not looking like there is a way forward with this approach

@micahl micahl assigned veyalla and unassigned micahl Mar 10, 2023
@github-actions
Copy link

This issue is being marked as stale because it has been open for 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants