Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Add monitoring metrics for open and harvested files in fileconsumer #31544

Merged
merged 2 commits into from
May 22, 2024

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark commented Mar 4, 2024

Blocked on #31618

Description:

This PR adds support for filelog receiver to emit observable metrics about its current state: how many files are opened, and harvested.

Link to tracking Issue: #31256

Testing:

How to test this manually

  1. Use the following collector config:
receivers:
  filelog:
    start_at: end
    include:
    - /var/log/busybox/monitoring/*.log
exporters:
  debug:
    verbosity: detailed
service:
  telemetry:
    metrics:
      level: detailed
      address: ":8888"
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [debug]
      processors: []
  1. Build and run the collector: make otelcontribcol && ./bin/otelcontribcol_linux_amd64 --config ~/otelcol/monitoring_telemetry/config.yaml
  2. Produce some logs:
echo 'some line' >> /var/log/busybox/monitoring/1.log
while true; do echo -e "This is a log line" >> /var/log/busybox/monitoring/2.log; done
  1. Verify that metrics are produced:
curl 0.0.0.0:8888/metrics | grep _files
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4002    0  4002    0     0  1954k      0 --:--:-- --:--:-- --:--:-- 1954k
# HELP otelcol_fileconsumer_open_files Number of open files
# TYPE otelcol_fileconsumer_open_files gauge
otelcol_fileconsumer_open_files{service_instance_id="72b4899d-6ce3-41de-a25b-8f0370e22ec1",service_name="otelcontribcol",service_version="0.99.0-dev"} 2
# HELP otelcol_fileconsumer_reading_files Number of open files that are being read
# TYPE otelcol_fileconsumer_reading_files gauge
otelcol_fileconsumer_reading_files{service_instance_id="72b4899d-6ce3-41de-a25b-8f0370e22ec1",service_name="otelcontribcol",service_version="0.99.0-dev"} 1

Documentation:
Added a respective section in Filelog receiver's docs.

@ChrsMark ChrsMark force-pushed the add_filelog_metrics branch 2 times, most recently from 90e25c4 to 013941a Compare March 4, 2024 13:13
@github-actions github-actions bot added the processor/logstransform Logs Transform processor label Mar 4, 2024
@github-actions github-actions bot requested a review from dehaansa March 4, 2024 13:14
@ChrsMark ChrsMark force-pushed the add_filelog_metrics branch 6 times, most recently from a51c88b to bb8d0f1 Compare March 5, 2024 12:59
Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind splitting this into two PRs - one to pass in component.TelemetrySettings, the other to add the metrics?

pkg/stanza/operator/input/tcp/tcp_test.go Outdated Show resolved Hide resolved
pkg/stanza/fileconsumer/file.go Show resolved Hide resolved
pkg/stanza/fileconsumer/config.go Outdated Show resolved Hide resolved
pkg/stanza/fileconsumer/file.go Outdated Show resolved Hide resolved
@ChrsMark ChrsMark changed the title [receiver/filelog ] Add monitoring metrics about open and harvested files [receiver/filelog] Add monitoring metrics about open and harvested files Mar 6, 2024
@ChrsMark ChrsMark force-pushed the add_filelog_metrics branch 3 times, most recently from df7ecbd to 00e930a Compare March 6, 2024 11:57
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Mar 21, 2024
@ChrsMark
Copy link
Member Author

Still WiP.

@github-actions github-actions bot removed the Stale label Mar 22, 2024
Copy link
Contributor

github-actions bot commented Apr 6, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Apr 6, 2024
@ChrsMark
Copy link
Member Author

ChrsMark commented Apr 6, 2024

Still WiP

@github-actions github-actions bot removed the Stale label Apr 7, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Apr 22, 2024
@ChrsMark ChrsMark changed the title [receiver/filelog] Add monitoring metrics about open and harvested files [pkg/stanza] Add monitoring metrics for open and harvested files in fileconsumer May 17, 2024
@ChrsMark ChrsMark requested a review from djaglowski May 17, 2024 11:31
meter := set.MeterProvider.Meter("otelcol/fileconsumer")

openedFiles, err := meter.Int64UpDownCounter(
"fileconsumer"+"/"+"open_files",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this broken up because there was an intention to use constants?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, but good point. I will make those constants!

@@ -105,6 +110,7 @@ func (m *Manager) startPoller(ctx context.Context) {

// poll checks all the watched paths for new entries
func (m *Manager) poll(ctx context.Context) {
m.reportMetrics(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to increment whenever we open a file and decrement whenever we close one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that would doable inside the Tracker. Specifically inside the Add(reader *reader.Reader) and ClosePreviousFiles() methods. Is that what you had in mind?

In that case, the metric should be called otelcol_tracker_open_files right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name can stay the same. IMO users don't need to be exposed to the internal package structure. Sticking with fileconsumer seems specific enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to increment whenever we open a file and decrement whenever we close one?

So, files are opened by the Manager but are closed by the Manager and the Tracker as well from what I can see.
Also inside the Manager's operations there are multiple places where files are re-opened and then their readers are closed again like at https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.100.0/pkg/stanza/fileconsumer/file.go#L203 or https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.100.0/pkg/stanza/fileconsumer/file.go#L222.

Based on this I thought that it makes sense to simplify and only increment the openFiles counter only when we create a fresh reader at https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.100.0/pkg/stanza/fileconsumer/file.go#L225-L232.

Incrementing in previous steps and then possibly decrementing after a couple of instrunctions didn't make a lot of sense to me but let me know what you think :).

@ChrsMark ChrsMark force-pushed the add_filelog_metrics branch 6 times, most recently from 8010c71 to cdff3b2 Compare May 20, 2024 14:42
@newly12
Copy link
Contributor

newly12 commented May 21, 2024

Hi @ChrsMark, one quick question, are the new metrics shown at detailed metrics level or normal level?

@ChrsMark
Copy link
Member Author

Hi @ChrsMark, one quick question, are the new metrics shown at detailed metrics level or normal level?

They are shown on the normal level as well.

@@ -106,6 +131,7 @@ func (t *fileTracker) ClosePreviousFiles() {

for r, _ := t.previousPollFiles.Pop(); r != nil; r, _ = t.previousPollFiles.Pop() {
t.knownFiles[0].Add(r.Close())
t.openFiles.Add(context.TODO(), -1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really the only opening or closing of files we do in the tracker, so it does feel quite awkward to have the tracker manage the meter. Instead, what if we just return the number of files closed from this method? We're closing them one after another, so decrementing after each one or all at once seems equivalent in this case.

Copy link
Member Author

@ChrsMark ChrsMark May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we would need to either have the EndConsume() to return the number of files closed as well. or to move the ClosePreviousFiles() outside of it. I'm leaning towards the latter.

It seems that the calling order of the ClosePreviousFiles() is different on Windows so we cannot decouple it from the EndConsume()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter won't work because the order of operations is different depending on OS.

I think we could get rid of OS dependent behavior for closing files if we implement #32037.

Otherwise, I'm fine with returning the count from EndConsume.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had missed that EndConsume of noStateTracker closes files directly so we need to have it returning the number of files closed anyway.

@ChrsMark ChrsMark force-pushed the add_filelog_metrics branch 3 times, most recently from 32f8e54 to 52aeef2 Compare May 21, 2024 14:29
Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
@ChrsMark ChrsMark force-pushed the add_filelog_metrics branch 3 times, most recently from 35d922f to a94b94f Compare May 22, 2024 09:11
Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
@djaglowski djaglowski merged commit 4ba2c7c into open-telemetry:main May 22, 2024
162 checks passed
@github-actions github-actions bot added this to the next release milestone May 22, 2024
@@ -73,7 +77,7 @@ func (m *Manager) Stop() error {
m.cancel = nil
}
m.wg.Wait()
m.tracker.ClosePreviousFiles()
m.openFiles.Add(context.TODO(), int64(0-m.tracker.ClosePreviousFiles()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you willing to look into what it takes to pass the context through?

Copy link
Member Author

@ChrsMark ChrsMark May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken we could pass down the ctx from the Shutdown of the stanza receiver when we stop the internal pipeline at

.

Then if we pass the ctx on every single operator's Stop() we can make it available to the Stop() of the Manager/fileconsumer at

when we stop the file_input operator.

This would be doable I think but would be a breaking change for the Pipeline interface and the Operator interface (plus the Manager).
Hence, I'm not sure if that's worth doing it unless we find value in doing so anyways and not just for the meter's need?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants